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Preface 


This is the second edition of this text on survival analysis, 
originally published in 1996. As in the first edition, each chap¬ 
ter contains a presentation of its topic in “lecture-book” for¬ 
mat together with objectives, an outline, key formulae, prac¬ 
tice exercises, and a test. The “lecture-book” format has a 
sequence of illustrations and formulae in the left column of 
each page and a script in the right column. This format allows 
you to read the script in conjunction with the illustrations and 
formulae that high-light the main points, formulae, or exam¬ 
ples being presented. 

This second edition has expanded the first edition by adding 
three new chapters and a revised computer appendix. The 
three new chapters are: 

Chapter 7. Parametric Survival Models 
Chapter 8. Recurrent Event Survival Analysis 
Chapter 9. Competing Risks Survival Analysis 

Chapter 7 extends survival analysis methods to a class of sur¬ 
vival models, called parametric models, in which the distri¬ 
bution of the outcome (i.e., the time to event) is specified in 
terms of unknown parameters. Many such parametric models 
are acceleration failure time models, which provide an alter¬ 
native measure to the hazard ratio called the “acceleration 
factor”. The general form of the likelihood for a parametric 
model that allows for left, right, or interval censored data is 
also described. The chapter concludes with an introduction 
to frailty models. 

Chapter 8 considers survival events that may occur more than 
once over the follow-up time for a given subject. Such events 
are called “recurrent events”. Analysis of such data can be 
carried out using a Cox PH model with the data layout aug¬ 
mented so that each subject has a line of data for each re¬ 
current event. A variation of this approach uses a stratified 
Cox PH model, which stratifies on the order in which recur¬ 
rent events occur. The use of “robust variance estimates” are 
recommended to adjust the variances of estimated model co¬ 
efficients for correlation among recurrent events on the same 
subject. 



Preface 


Suggestions 
for Use 


Chapter 9 considers survival data in which each subject can 
experience only one of several different types of events (“com¬ 
peting risks”) over follow-up. Modeling such data can be car¬ 
ried out using a Cox model, a parametric survival model or a 
model which uses cumulative incidence (rather than survival). 

The Computer Appendix in the first edition of this text has 
now been revised and extended to provide step-by-step in¬ 
structions for using the computer packages STATA (version 
7.0), SAS (version 8.2), and SPSS (version 11.5) to carry out 
the survival analyses presented in the main text. These com¬ 
puter packages are described in separate self-contained sec¬ 
tions of the Computer Appendix, with the analysis of the same 
datasets illustrated in each section. The SPIDA package used 
in the first edition is no longer active and has therefore been 
omitted from the appendix and computer output in the main 
text. 

In addition to the above new material, the original six chap¬ 
ters have been modified slightly to correct for errata in the first 
edition, to clarify certain issues, and to add theoretical back¬ 
ground, particularly regarding the formulation of the (partial) 
likelihood functions for the Cox PH (Chapter 3) and extended 
Cox (Chapter 6) models. 

The authors’ website for this textbook has the following web- 
link: http://www.sph.emory.edu/~dkleinb/surv2.htm 

This website includes information on how to order this 
second edition from the publisher and a freely downloadable 
zip-file containing data-files for examples used in the text¬ 
book. 


This text was originally intended for self-study, but in the nine 
years since the first edition was published, it has also been ef¬ 
fectively used as a text in a standard lecture-type classroom 
format. The text may also be use to supplement material cov¬ 
ered in a course or to review previously learned material in 
a self-instructional course or self-planned learning activity. 
A more individualized learning program may be particularly 
suitable to a working professional who does not have the time 
to participate in a regularly scheduled course. 



Preface ix 


Recommended 

Preparation 


In working with any chapter, the learner is encouraged first to 
read the abbreviated outline and the objectives and then work 
through the presentation. The reader is then encouraged to 
read the detailed outline for a summary of the presentation, 
work through the practice exercises, and, finally, complete the 
test to check what has been learned. 


The ideal preparation for this text on survival analysis is a 
course on quantitative methods in epidemiology and a course 
in applied multiple regression. Also, knowledge of logistic re¬ 
gression, modeling strategies, and maximum likelihood tech¬ 
niques is crucial for the material on the Cox and parametric 
models described in chapters 3-9. 

Recommended references on these subjects, with suggested 
chapter readings are: 

Kleinbaum D, Kupper L, Muller K, and Nizam A, Applied 
Regression Analysis and Other Multivariable Methods, 
Third Edition, Duxbury Press, Pacific Grove, 1998, Chapters 
1-16,22-23 

Kleinbaum D, Kupper L and Morgenstern H, Epidemiologic 
Research: Principles and Quantitative Methods, John 
Wiley and Sons, Publishers, New York, 1982, Chapters 20- 
24. 

Kleinbaum D and Klein M, Logistic Regression: A Self- 
Learning Text, Second Edition, Springer-Verlag Publishers, 
New York, Chapters 4-7, 11. 

Kleinbaum D, ActivEpi-A CD Rom Electronic Textbook on 
Fundamentals of Epidemiology, Springer-Verlag Publish¬ 
ers, New York, 2002, Chapters 13-15. 

A first course on the principles of epidemiologic research 
would be helpful, since all chapters in this text are written 
from the perspective of epidemiologic research. In particular, 
the reader should be familiar with the basic characteristics of 
epidemiologic study designs, and should have some idea of 
the frequently encountered problem of controlling for con¬ 
founding and assessing interaction/effect modification. The 
above reference, ActivEpi, provides a convenient and hope¬ 
fully enjoyable way to review epidemiology. 
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2 1. Introduction to Survival Analysis 


Introduction 


Abbreviated 

Outline 


This introduction to survival analysis gives a descriptive 
overview of the data analytic approach called survival analy¬ 
sis. This approach includes the type of problem addressed by 
survival analysis, the outcome variable considered, the need 
to take into account “censored data,” what a survival func¬ 
tion and a hazard function represent, basic data layouts for 
a survival analysis, the goals of survival analysis, and some 
examples of survival analysis. 

Because this chapter is primarily descriptive in content, no 
prerequisite mathematical, statistical, or epidemiologic con¬ 
cepts are absolutely necessary. A first course on the principles 
of epidemiologic research would be helpful. It would also be 
helpful if the reader has had some experience reading math¬ 
ematical notation and formulae. 


The outline below gives the user a preview of the material to 
be covered by the presentation. A detailed outline for review 
purposes follows the presentation. 

I. What is survival analysis? (pages 4-5) 

II. Censored data (pages 5-8) 

III. Terminology and notation (pages 8-14) 

IV. Goals of survival analysis (page 15) 

V. Basic data layout for computer (pages 15-19) 

VI. Basic data layout for understanding analysis 
(pages 19-24) 

VII. Descriptive measures of survival experience 
(pages 24-26) 

VIII. Example: Extended remission data (pages 26-29) 

IX. Multivariable example (pages 29-31) 

X. Math models in survival analysis (pages 31-33) 
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Objectives 


Upon completing the chapter, the learner should be able to: 

1. Recognize or describe the type of problem addressed by 
a survival analysis. 

2. Define what is meant by censored data. 

3. Define or recognize right-censored data. 

4. Give three reasons why data may be censored. 

5. Define, recognize, or interpret a survivor function. 

6. Define, recognize, or interpret a hazard function. 

7. Describe the relationship between a survivor function 
and a hazard function. 

8. State three goals of a survival analysis. 

9. Identify or recognize the basic data layout for the com¬ 
puter; in particular, put a given set of survival data into 
this layout. 

10. Identify or recognize the basic data layout, or compo¬ 
nents thereof, for understanding modeling theory; in par¬ 
ticular, put a given set of survival data into this layout. 

11. Interpret or compare examples of survivor curves or haz¬ 
ard functions. 

12. Given a problem situation, state the goal of a survival 
analysis in terms of describing how explanatory vari¬ 
ables relate to survival time. 

13. Compute or interpret average survival and/or average 
hazard measures from a set of survival data. 

14. Define or interpret the hazard ratio defined from com¬ 
paring two groups of survival data. 
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Presentation 



This presentation gives a general introduction 
to survival analysis, a popular data analysis ap¬ 
proach for certain kinds of epidemiologic and 
other data. Here we focus on the problem ad¬ 
dressed by survival analysis, the goals of a survival 
analysis, key notation and terminology, the basic 
data layout, and some examples. 


I. What Is Survival Analysis? We begin by describing the type of analytic prob¬ 
lem addressed by survival analysis. Generally, sur¬ 
vival analysis is a collection of statistical proce- 
Outcome variable: Time until an dures for data analysis for which the outcome vari- 
event occurs able of interest is time until an event occurs. 


Start follow-up 


TIME 


► 


Event 


By time, we mean years, months, weeks, or days 
from the beginning of follow-up of an individual 
until an event occurs; alternatively, time can refer 
to the age of an individual when an event occurs. 


Event: death 
disease 
relapse 
recovery 


By event, we mean death, disease incidence, re¬ 
lapse from remission, recovery (e.g., return to 
work) or any designated experience of interest that 
may happen to an individual. 


Assume 1 event 


> 1 event 



Recurrent event 
or 

Competing risk 


Although more than one event may be considered 
in the same analysis, we will assume that only 
one event is of designated interest. When more 
than one event is considered (e.g., death from any 
of several causes), the statistical problem can be 
characterized as either a recurrent events or a 
competing risk problem, which are discussed in 
Chapters 8 and 9, respectively. 


Time = survival time In a survival analysis, we usually refer to the time 

variable as survival time, because it gives the time 
Event = failure that an individual has “survived” over some follow¬ 

up period. We also typically refer to the event as 
a failure, because the event of interest usually is 
death, disease incidence, or some other negative 
individual experience. However, survival time may 
be “time to return to work after an elective surgi¬ 
cal procedure,” in which case failure is a positive 
event. 
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EXAMPLE 


1. Leukemia patients/time in remis¬ 
sion (weeks) 

2. Disease-free cohort/time until heart 
disease (years) 

3. Elderly (60+) population/time until 
death (years) 

4. Parolees (recidivism study)/time 
until rearrest (weeks) 

5. Heart transplants/time until death 
(months) 


Five examples of survival analysis problems are 
briefly mentioned here. The first is a study that fol¬ 
lows leukemia patients in remission over several 
weeks to see how long they stay in remission. The 
second example follows a disease-free cohort of 
individuals over several years to see who develops 
heart disease. A third example considers a 13-year 
follow-up of an elderly population (60+ years) to 
see how long subjects remain alive. A fourth ex¬ 
ample follows newly released parolees for several 
weeks to see whether they get rearrested. This type 
of problem is called a recidivism study. The fifth 
example traces how long patients survive after re¬ 
ceiving a heart transplant. 


All of the above examples are survival analysis 
problems because the outcome variable is time 
until an event occurs. In the first example, involv¬ 
ing leukemia patients, the event of interest (i.e., 
failure) is “going out of remission,” and the out¬ 
come is “time in weeks until a person goes out 
of remission.” In the second example, the event 
is “developing heart disease,” and the outcome is 
“time in years until a person develops heart dis¬ 
ease.” In the third example, the event is “death” 
and the outcome is “time in years until death.” 
Example four, a sociological rather than a medi¬ 
cal study, considers the event of recidivism (i.e., 
getting rearrested), and the outcome is “time in 
weeks until rearrest.” Finally, the fifth example 
considers the event “death,” with the outcome be¬ 
ing “time until death (in months from receiving a 
transplant).” 


We will return to some of these examples later in 
this presentation and in later presentations. 


II. Censored Data Most survival analyses must consider a key 

analytical problem called censoring. In essence, 
censoring occurs when we have some information 
Censoring: don't know survival about individual survival time, but we don’t know 
time exactly the survival time exactly. 
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As a simple example of censoring, consider 
leukemia patients followed until they go out of re¬ 
mission, shown here as X. If for a given patient, 
the study ends while the patient is still in remission 
(i.e., doesn’t get the event), then that patient's sur¬ 
vival time is considered censored. We know that, 
for this person, the survival time is at least as long 
as the period that the person has been followed, 
but if the person goes out of remission after the 
study ends, we do not know the complete survival 
time. 


Why censor? 

1. study ends—no event 

2. lost to follow-up 

3. withdraws 


There are generally three reasons why censoring 
may occur: 

(1) a person does not experience the event before 

the study ends; 

(2) a person is lost to follow-up during the study 
period; 

(3) a person withdraws from the study because 
of death (if death is not the event of interest) or 
some other reason (e.g., adverse drug reaction 
or other competing risk) 


EXAMPLE 


Weeks-► 

2 4 6 8 10 12 

A 

T= 5 



A 


B 

<N 

II 

Study end 


C 

^ Withdrawn 


D 

r = 8 

Study end 


E 

T = 6 

-Lost 


p 

T =3.5 



A 



These situations are graphically illustrated here. 
The graph describes the experience of several per¬ 
sons followed over time. An X denotes a person 
who got the event. 

Person A, for example, is followed from the start 
of the study until getting the event at week 5; his 
survival time is 5 weeks and is not censored. 

Person B also is observed from the start of the 
study but is followed to the end of the 12-week 
study period without getting the event; the survival 
time here is censored because we can say only that 
it is at least 12 weeks. 


Person C enters the study between the second and 
third week and is followed until he withdraws 
from the study at 6 weeks; this person’s survival 
time is censored after 3.5 weeks. 


Person D enters at week 4 and is followed for the 
remainder of the study without getting the event; 
this person’s censored time is 8 weeks. 
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Person E enters the study at week 3 and is fol¬ 
lowed until week 9, when he is lost to follow-up; 
his censored time is 6 weeks. 

Person F enters at week 8 and is followed until 
getting the event at week 11.5. As with person A, 
there is no censoring here; the survival time is 
3.5 weeks. 


SUMMARY 

Event: A, F 
Censored: B, C, D, E 


In summary, of the six persons observed, two get 
the event (persons A and F) and four are censored 
(B, C, D, and E). 


Person 

Survival 

time 

Failed (1); 
censored (0) 

A 

5 

1 

B 

12 

0 

cZ 

3.5 


D 

8 

0 

E 

6 

0 


3.5 



A table of the survival time data for the six persons 
in the graph is now presented. For each person, 
we have given the corresponding survival time up 
to the event’s occurrence or up to censorship. We 
have indicated in the last column whether this 
time was censored or not (with 1 denoting failed 
and 0 denoting censored). For example, the data 
for person C is a survival time of 3.5 and a cen¬ 
sorship indicator of 0, whereas for person F the 
survival time is 3.5 and the censorship indicator is 
1. This table is a simplified illustration of the type 
of data to be analyzed in a survival analysis. 


Weeks - 
4 6 


8 10 12 



Notice in our example that for each of the four 
persons censored, we know that the person’s exact 
survival time becomes incomplete at the right side 
of the follow-up period, occurring when the study 
ends or when the person is lost to follow-up or is 
withdrawn. We generally refer to this kind of data 
as right-censored. For these data, the complete 
survival time interval, which we don't really know, 
has been cut off (i.e., censored) at the right side of 
the observed survival time interval. Although data 
can also be left-censored, most survival data is 
right-censored. 
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True survival time 


__A. 

r 


"\ 


Observed survival time 


t t 

Study start HIV exposure 



HIV + test 


Left-censored data can occur when a person’s true 
survival time is less than or equal to that person's 
observed survival time. For example, if we are fol¬ 
lowing persons until they become HIV positive, 
we may record a failure when a subject firsts tests 
positive for the virus. However, we may not know 
exactly the time of first exposure to the virus, and 
therefore do not know exactly when the failure oc¬ 
curred. Thus, the survival time is censored on the 
left side since the true survival time, which ends 
at exposure, is shorter than the follow-up time, 
which ends when the subject tests positive. 


III. Terminology and Notation 


T = survival time (T > 0) 



random variable 


We are now ready to introduce basic mathemati¬ 
cal terminology and notation for survival analysis. 
First, we denote by a capital T the random vari¬ 
able for a person’s survival time. Since T denotes 
time, its possible values include all nonnegative 
numbers; that is, T can be any number equal to or 
greater than zero. 


t = specific value for T Next, we denote by a small letter t any specific 

value of interest for the random variable capital 
T. For example, if we are interested in evaluating 
whether a person survives for more than 5 years 
after undergoing cancer therapy, small t equals 5; 
we then ask whether capital T exceeds 5. 

Finally, we let the Greek letter delta (6) denote a 
(0,1) random variable indicating either failure or 
censorship. That is, 6 = 1 for failure if the event 
occurs during the study period, or 8 = 0 if the sur¬ 
vival time is censored by the end of the study pe¬ 
riod. Note that if a person does not fail, that is, 
does not get the event during the study period, cen¬ 
sorship is the only remaining possibility for that 
person’s survival time. That is, 6 = 0 if and only 
if one of the following happens: a person survives 
until the study ends, a person is lost to follow-up, 
or a person withdraws during the study period. 

We next introduce and describe two quantitative 
terms considered in any survival analysis. These 
are the survivor function, denoted by S(t), and 
the hazard function, denoted by h(t). 



6 = (0, 1) random variable 

11 if failure 
— 10 if censored 

• study ends 

• lost to follow-up 

• withdraws 


S(t) = survivor function 
h(t) = hazard function 
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S(t) = P(T >t) 

t S(t) 


1 s(i) = p(r>i) 

2 S(2) = P(T > 2) 

3 S(3) = P(T > 3) 


The survivor function S(t) gives the probability 
that a person survives longer than some specified 
time t: that is, S(l ) gives the probability that the 
random variable T exceeds the specified time t. 

The survivor function is fundamental to a survival 
analysis, because obtaining survival probabilities 
for different values of t provides crucial summary 
information from survival data. 


Theoretical S(t): 


s( 0) = t 



S(°°) = o 



Theoretically, as t ranges from 0 up to infinity, 
the survivor function can be graphed as a smooth 
curve. As illustrated by the graph, where t iden¬ 
tifies the X-axis, all survivor functions have the 
following characteristics: 

• they are nonincreasing; that is, they head 
downward as t increases; 

• at time t = 0, S(t) = S( 0) = 1; that is, at the 
start of the study, since no one has gotten the 
event yet, the probability of surviving past time 
0 is one; 

• at time t = oo, S(t) = S(o o) = 0; that is, theo¬ 
retically, if the study period increased without 
limit, eventually nobody would survive, so the 
survivor curve must eventually fall to zero. 


Note that these are theoretical properties of sur¬ 
vivor curves. 


S(t) in practice: 


S(t) 


0 t Study end 



In practice, when using actual data, we usually 
obtain graphs that are step functions, as illus¬ 
trated here, rather than smooth curves. Moreover, 
because the study period is never infinite in length 
and there may be competing risks for failure, it is 
possible that not everyone studied gets the event. 
The estimated survivor function, denoted by a 
caret over the S in the graph, thus may not go all 
the way down to zero at the end of the study. 


h(t) = lim 

A/—»0 


p(t < t < t + Apr > t) 

At 


The hazard function, denoted by h(t), is given 
by the formula: h(t) equals the limit, as A t ap¬ 
proaches zero, of a probability statement about 
survival, divided by At, where At denotes a small 
interval of time. This mathematical formula is dif¬ 
ficult to explain in practical terms. 












10 1. Introduction to Survival Analysis 


h(t) = instantaneous potential 



Before getting into the specifics of the formula, 
we give a conceptual interpretation. The hazard 
function 1i(t) gives the instantaneous potential 
per unit time for the event to occur, given that 
the individual has survived up to time t. Note 
that, in contrast to the survivor function, which 
focuses on not failing, the hazard function focuses 
on failing, that is, on the event occurring. Thus, in 
some sense, the hazard function can be considered 
as giving the opposite side of the information given 
by the survivor function. 



To get an idea of what we mean by instantaneous 
potential, consider the concept of velocity. If, for 
example, you are driving in your car and you see 
that your speedometer is registering 60 mph, what 
does this reading mean? It means that if in the 
next hour, you continue to drive this way, with 
the speedometer exactly on 60, you would cover 
60 miles. This reading gives the potential, at the 
moment you have looked at your speedometer, 
for how many miles you will travel in the next 
hour. However, because you may slow down or 
speed up or even stop during the next hour, the 
60-mph speedometer reading does not tell you 
the number of miles you really will cover in the 
next hour. The speedometer tells you only how 
fast you are going at a given moment; that is, the 
instrument gives your instantaneous potential or 
velocity. 


Velocity at time 


h{t) 



Instantaneous potential 


Similar to the idea of velocity, a hazard function 
h(t) gives the instantaneous potential at time t 
for getting an event, like death or some disease 
of interest, given survival up to time t. The given 
part, that is, surviving up to time t, is analo¬ 
gous to recognizing in the velocity example that 
the speedometer reading at a point in time in¬ 
herently assumes that you have already traveled 
some distance (i.e., survived) up to the time of the 
reading. 
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Given 



h(t) = lim 
Af->0 


P{t<T<t + At 
At 


T>t) 


Conditional probabilities: P(A\B) 

P(t <T <t + At\T >t) 

= P(individual fails in the interval 
[t,t + At] | survival up to time t) 


Hazard function = conditional 
failure rate 



<T<t + At 
At 



Probability per unit time 
Rate: 0 to oo 


P = P(t < T < t + \t\T >t) 


P At 


P /At = rate 


~ -day 
3 2 


X ^ 1 = 0.67/day 


1 —7 week 

3 14 1/14 


= 4.67/week 


In mathematical terms, the given part of the for¬ 
mula for the hazard function is found in the proba¬ 
bility statement—the numerator to the right of the 
limit sign. This statement is a conditional prob¬ 
ability because it is of the form, “P of A, given 
B,” where the P denotes probability and where 
the long vertical line separating A from B denotes 
“given.” In the hazard formula, the conditional 
probability gives the probability that a person's 
survival time, T, will lie in the time interval be¬ 
tween t and t + At, given that the survival time 
is greater than or equal to t. Because of the given 
sign here, the hazard function is sometimes called 
a conditional failure rate. 

We now explain why the hazard is a rate rather 
than a probability. Note that in the hazard func¬ 
tion formula, the expression to the right of the 
limit sign gives the ratio of two quantities. The 
numerator is the conditional probability we just 
discussed. The denominator is At, which denotes 
a small time interval. By this division, we obtain a 
probability per unit time, which is no longer a 
probability but a rate. In particular, the scale 
for this ratio is not 0 to 1 , as for a probability, 
but rather ranges between 0 and infinity, and de¬ 
pends on whether time is measured in days, weeks, 
months, or years, etc. 

For example, if the probability, denoted here by 
P, is 1/3, and the time interval is one-half a day, 
then the probability divided by the time interval 
is 1/3 divided by 1/2, which equals 0.67 per day. 
As another example, suppose, for the same prob¬ 
ability of 1/3, that the time interval is considered 
in weeks, so that 1/2 day equals 1/14 of a week. 
Then the probability divided by the time interval 
becomes 1/3 over 1/14, which equals 14/3, or 4.67 
per week. The point is simply that the expression P 
divided by At at the right of the limit sign does not 
give a probability. The value obtained will give 
a different number depending on the units of 
time used, and may even give a number larger 
than one. 
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hit) = P{t<T<t + \t\T>t) 


Gives 

. instantaneous 
potential 


Hazard functions 


h(t) 



0 


t 


• hit) > 0 

• hit) has no upper bound 


When we take the limit of the right-side expres¬ 
sion as the time interval approaches zero, we are 
essentially getting an expression for the instanta¬ 
neous probability of failing at time t per unit time. 
Another way of saying this is that the conditional 
failure rate or hazard function h(t) gives the in¬ 
stantaneous potential for failing at time t per unit 
time, given survival up to time t. 

As with a survivor function, the hazard function 
h{t ) can be graphed as t ranges over various values. 
The graph at the left illustrates three different haz¬ 
ards. In contrast to a survivor function, the graph 
of hit ) does not have to start at 1 and go down to 
zero, but rather can start anywhere and go up and 
down in any direction over time. In particular, for 
a specified value of t, the hazard function hit ) has 
the following characteristics: 

• it is always nonnegative, that is, equal to or 
greater than zero; 

• it has no upper bound. 

These two features follow from the ratio expres¬ 
sion in the formula for hit), because both the prob¬ 
ability in the numerator and the At in the denom¬ 
inator are nonnegative, and since At can range 
between 0 and oo. 


EXAMPLE 

© 

Constant hazard 

(exponential model 

h(t) for healthy 
persons k 

) 



t 


Now we show some graphs of different types of 
hazard functions. The first graph given shows a 
constant hazard for a study of healthy persons. 
In this graph, no matter what value of t is spec¬ 
ified, hit ) equals the same value—in this exam¬ 
ple, k. Note that for a person who continues to be 
healthy throughout the study period, his/her in¬ 
stantaneous potential for becoming ill at any time 
during the period remains constant throughout 
the follow-up period. When the hazard function 
is constant, we say that the survival model is ex¬ 
ponential. This term follows from the relation¬ 
ship between the survivor function and the hazard 
function. We will return to this relationship later. 













Presentation: III. Terminology and Notation 13 


EXAMPLE (continued) 

© 

t Weibull 

hit) for leukemia 
patients 





t 

© 

i Weibull 

h(t) for persons 
recovering from 
surgery 



t 

© 

t i lognormal 

h(t) for TB 
patients 



t 


The second graph shows a hazard function that 
is increasing over time. An example of this kind 
of graph is called an increasing Weibull model. 
Such a graph might be expected for leukemia 
patients not responding to treatment, where the 
event of interest is death. As survival time in¬ 
creases for such a patient, and as the prognosis 
accordingly worsens, the patient’s potential for dy¬ 
ing of the disease also increases. 

In the third graph, the hazard function is decreas¬ 
ing over time. An example of this kind of graph is 
called a decreasing Weibull. Such a graph might 
be expected when the event is death in persons 
who are recovering from surgery, because the po¬ 
tential for dying after surgery usually decreases as 
the time after surgery increases. 

The fourth graph given shows a hazard function 
that is first increasing and then decreasing. An 
example of this type of graph is the lognormal 
survival model. We can expect such a graph for 
tuberculosis patients, since their potential for dy¬ 
ing increases early in the disease and decreases 
later. 


S(t): directly describes survival 
h(t): • a measure of instantaneous 
potential 

• identify specific model 
form 

• math model for survival 
analysis 


Of the two functions we have considered, S(t) and 
h(t), the survivor function is more naturally ap¬ 
pealing for analysis of survival data, simply be¬ 
cause S(t) directly describes the survival experi¬ 
ence of a study cohort. 

However, the hazard function is also of interest for 
the following reasons: 


• it is a measure of instantaneous potential 
whereas a survival curve is a cumulative mea¬ 
sure over time; 

• it may be used to identify a specific model 
form, such as an exponential, a Weibull, or a 
lognormal curve that fits one's data; 

• it is the vehicle by which mathematical mod¬ 
eling of survival data is carried out; that is, the 
survival model is usually written in terms of 
the hazard function. 
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Relationship of Sit) and hit): 

If you know one, you can determine 
the other. 



Regardless of which function Sit) or hit) one 
prefers, there is a clearly defined relationship 
between the two. In fact, if one knows the form 
of S(t), one can derive the corresponding h(t), and 
vice versa. For example, if the hazard function is 
constant—i.e., h(l ) = X, for some specific value 
A.—then it can be shown that the corresponding 
survival function is given by the following for¬ 
mula: S(t) equals e to the power minus X times t. 


General formulae: 


S(t) = exp 



h(t) = 


~dS(t)/dt~ 
. S(t) _ 


More generally, the relationship between Sit) and 
hit) can be expressed equivalently in either of two 
calculus formulae shown here. 

The first of these formulae describes how the sur¬ 
vivor function Sit) can be written in terms of an in¬ 
tegral involving the hazard function. The formula 
says that Sit) equals the exponential of the nega¬ 
tive integral of the hazard function between inte¬ 
gration limits of 0 and t. 


The second formula describes how the haz¬ 
ard function hit) can be written in terms of a 
derivative involving the survivor function. This 
formula says that h(t ) equals minus the derivative 
of S(t) with respect to t divided by Sit). 



In any actual data analysis a computer program 
can make the numerical transformation from Sit) 
to h(t), or vice versa, without the user ever having 
to use either formula. The point here is simply that 
if you know either Sit) or h(t), you can get the 
other directly. 


SUMMARY 

T = survival time random 
variable 

t = specific value of T 
6 = (0, 1) variable for failure/ 
censorship 

Sit) = survivor function 
hit) = hazard function 


At this point, we have completed our discussion 
of key terminology and notation. The key no¬ 
tation is T for the survival time variable, t 
for a specified value of T, and 5 for the di¬ 
chotomous variable indicating event occur¬ 
rence or censorship. The key terms are the 
survivor function S(t) and the hazard func¬ 
tion h(t), which are in essence opposed con¬ 
cepts, in that the survivor function focuses on 
surviving whereas the hazard function focuses 
on failing, given survival up to a certain time 
point. 
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IV. Goals of Survival Analysis We now state the basic goals of survival analysis. 

Goal 1: To estimate and interpret survivor and/or 
hazard functions from survival data. 

Goal 2: To compare survivor and/or hazard func¬ 
tions. 

Goal 3: To assess the relationship of explanatory 
variables to survival time. 




Regarding the first goal, consider, for example, the 
two survivor functions pictured at the left, which 
give very different interpretations. The function 
farther on the left shows a quick drop in survival 
probabilities early in follow-up but a leveling off 
thereafter. The function on the right, in contrast, 
shows a very slow decrease in survival probabili¬ 
ties early in follow-up but a sharp decrease later 
on. 


Treatment 



We compare survivor functions for a treatment 
group and a placebo group by graphing these func¬ 
tions on the same axis. Note that up to 6 weeks, 
the survivor function for the treatment group lies 
above that for the placebo group, but thereafter 
the two functions are at about the same level. 
This dual graph indicates that up to 6 weeks the 
treatment is more effective for survival than the 
placebo but has about the same effect thereafter. 


Goal 3: Use math modeling, e.g., Cox Goal 3 usually requires using some form of math- 
proportional hazards ematical modeling, for example, the Cox propor¬ 

tional hazards approach, which will be the subject 
of subsequent modules. 


V. Basic Data Layout 
for Computer 

Two types of data layouts: 

• for computer use 

• for understanding 


We previously considered some examples of sur¬ 
vival analysis problems and a simple data set in¬ 
volving six persons. We now consider the general 
data layout for a survival analysis. We will provide 
two types of data layouts, one giving the form ap¬ 
propriate for computer use, and the other giving 
the form that helps us understand how a survival 
analysis works. 
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For computer: 


Indiv. # 


*1 X 2 


(5 f 5 = 3 got event ) 


X„ X,- 


C8 t s = 3 censored ) 


X, 


X, 


1 p 


2p 


We start by providing, in the table shown here, the 
basic data layout for the computer. Assume that we 
have a data set consisting of n persons. The first 
column of the table identifies each person from 1 , 
starting at the top, to n, at the bottom. 

The remaining columns after the first one provide 
survival time and other information for each per¬ 
son. The second column gives the survival time 
information, which is denoted t\ for individual 1 , 
t 2 for individual 2 , and so on, up to t„ for individual 
n. Each of these t s gives the observed survival time 
regardless of whether the person got the event or 
is censored. For example, if person 5 got the event 
at 3 weeks of follow-up, then f 5 = 3; on the other 
hand, if person 8 was censored at 3 weeks, without 
getting the event, then fg = 3 also. 


To distinguish persons who get the event from 
those who are censored, we turn to the third col¬ 
umn, which gives the information for status (i.e. 
6 ) the dichotomous variable that indicates censor¬ 
ship status. 


Failure Explanatory 
status variables 
i r i 

Indiv. # t 5 X l X 2 ' ‘ ' X p 

1 t] 8j X u X 12 X lp 

2 t 2 82 X 2 \ X 22 X 2p 


Thus, 61 is 1 if person 1 gets the event or is 0 if 
person 1 is censored; 5 2 is 1 or 0 similarly, and so 
on, up through 6 „. In the example just considered, 
person 5, who failed at 3 weeks, has a 6 of 1; that 
is, 65 equals 1. In contrast, person 8 , who was cen¬ 
sored at 3 weeks, has a 5 of 0; that is, 6 g equals 0. 


(5 t 5 =3 8 5 =1 ) 

n 

• X 6 := # failures 

1 _ 

(8 fe= 3 8 8 = 0 ) 


n 


x,„ X,, • • • X 


X, = Age, E, or Age x Race 


Note that if all of the 6 ; in this column are added 
up, their sum will be the total number of failures in 
the data set. This total will be some number equal 
to or less than n, because not every one may fail. 

The remainder of the information in the table 
gives values for explanatory variables of interest. 
An explanatory variable, X,, is any variable like 
age or exposure status, E, or a product term like 
age x race that the investigator wishes to consider 
to predict survival time. These variables are listed 
at the top of the table as X\, X 2 , and so on, up to 
X p . Below each variable are the values observed 
for that variable on each person in the data set. 




















Rows 
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Columns 


r 





'I 

# 

t 

8 


x 2 ’ 

' - *p 

r l 

h 

Si 

*11 

X l2 ’ 

• ' X lp 

2 

h 

s 2 

*21 

X 22 ’ 

• * X 2 p 

0 

tj 

8/ 


X ,2 • 

" x ip) 

• 





• 

• 





• 

• 





• 

L n 

t n 

s„ 

X„i 

X,i2 ' 

• • y 

-^np 


For example, in the column corresponding to X\ 
are the values observed on this variable for all n 
persons. These values are denoted as X\ \, X 2 1 , and 
so on, up to X n \] the first subscript indicates the 
person number, and the second subscript, a one 
in each case here, indicates the variable number. 
Similarly, the column corresponding to variable 
X 2 gives the values observed on X 2 for all n per¬ 
sons. This notation continues for the other X vari¬ 
ables up through X p . 

We have thus described the basic data layout by 
columns. Alternatively, we can look at the table 
line by line, that is, by rows. For each line or row, 
we have the information obtained on a given indi¬ 
vidual. Thus, for individual j, the observed infor¬ 
mation is given by the values tj, 8j, X,i, Xj 2 , etc., 
up to Xj p . This is how the information is read into 
the computer, that is, line by line, until all persons 
are included for analysis. 


EXAMPLE 

The data: Remission times (in weeks) 
for two groups of leukemia patients 

Group 1 

(Treatment) n = 21 

Group 2 
(Placebo) n = 21 

6 , 6, 6, 7, 10, 

1, 1.2, 2, 3, 

13, 16, 22, 23, 

4, 4, 5, 5, 

6 +, 9+, 10+, 11+, 

8 , 8, 8, 8, 

17+, 19+, 20+, 

11 , 11, 12, 12, 

25+, 32+, 32+, 

15, 17, 22, 23 

34+, 35+ 



In remission 
at study end 

+ denotes 

Lost to 

censored 

follow-up 


A Withdraws 


As an example of this data layout, consider the fol¬ 
lowing set of data for two groups of leukemia pa¬ 
tients: one group of 21 persons has received a cer¬ 
tain treatment; the other group of 21 persons has 
received a placebo. The data come from Freireich 
et al., Blood, 1963. 

As presented here, the data are not yet in tabu¬ 
lar form for the computer, as we will see shortly. 
The values given for each group consist of time in 
weeks a patient is in remission, up to the point of 
the patient’s either going out of remission or being 
censored. Here, going out of remission is a failure. 
A person is censored if he or she remains in remis¬ 
sion until the end of the study, is lost to follow-up, 
or withdraws before the end of the study. The cen¬ 
sored data here are denoted by a plus sign next to 
the survival time. 
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EXAMPLE (continued) 


Group 1 
(Treatment) n = 

21 

Group 2 
(Placebo) n 

= 21 

6 , 6, 6, 7, 10, 

13, 16, 22, 23, 

6 +, 9+, 10+, 11+ 

17+, 19+, 20+, 


I, 1, 2, 2, 3, 

4, 4, 5, 5, 

8, 8, 8, 8, 

II , 11, 12, 12, 

25+, 32+, 32+, 


15, 17, 22, 23 

34+, 35+ 

# failed 

# censored 

Total 

Group 1 9 


12 

21 

Group 2 21 


0 

21 

Indiv. 

t 

5 

(failed or 

X 

(#J (weeks) censored) 

(Group) 

1 

6 

1 

1 

2 

6 

1 

1 

© 

6 

1 

1 

4 

7 

1 

1 

5 

10 

1 

1 

6 

13 

1 

1 

7 

16 

1 

1 

8 

22 

1 

1 

GROUP 9 

23 

1 

1 

1 10 

6 

0 

1 

11 

9 

0 

1 

12 

10 

0 

1 

13 

11 

0 

1 

© 

17 

0 

1 

15 

19 

0 

1 

16 

20 

0 

1 

17 

25 

0 

1 

18 

32 

0 

1 

19 

32 

0 

1 

20 

34 

0 

1 

21 

35 

0 

1 


Here are the data again: 

Notice that the first three persons in group 1 went 
out of remission at 6 weeks; the next six per¬ 
sons also went out of remission, but at failure 
times ranging from 7 to 23. All of the remain¬ 
ing persons in group 1 with pluses next to their 
survival times are censored. For example, on line 
three the first person who has a plus sign next to a 
6 is censored at six weeks. The remaining persons 
in group one are also censored, but at times rang¬ 
ing from 9 to 35 weeks. 

Thus, of the 21 persons in group 1, nine failed dur¬ 
ing the study period, whereas the last 12 were cen¬ 
sored. Notice also that none of the data in group 
2 is censored; that is, all 21 persons in this group 
went out of remission during the study period. 

We now put this data in tabular form for the com¬ 
puter, as shown at the left. The list starts with the 
21 persons in group 1 (listed 1-21) and follows 
(on the next page) with the 21 persons in group 
2 (listed 22^12). Our n for the composite group 
is 42. 

The second column of the table gives the survival 
times in weeks for all 42 persons. The third col¬ 
umn indicates failure or censorship for each per¬ 
son. Finally, the fourth column lists the values of 
the only explanatory variable we have considered 
so far, namely, group status, with 1 denoting treat¬ 
ment and 0 denoting placebo. 


If we pick out any individual and read across the 
table, we obtain the line of data for that person that 
gets entered in the computer. For example, person 
#3 has a survival time of 6 weeks, and since 5=1, 
this person failed, that is, went out of remission. 
The X value is 1 because person #3 is in group 
1. As a second example, person #14, who has an 
observed survival time of 17 weeks, was censored 
at this time because 6 = 0. The X value is again 1 
because person #14 is also in group 1. 
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EXAMPLE (continued) 


Indiv. 

® 

t 

(weeks) 

5 

(failed or 
censored) 

X 

(Group) 


22 

1 

1 

0 


23 

1 

1 

0 


24 

2 

1 

0 


25 

2 

1 

0 


26 

3 

1 

0 


27 

4 

1 

0 

GROUP 

28 

4 

1 

0 

2 

29 

5 

1 

0 


30 

5 

1 

0 


31 

8 

1 

0 


© 

8 

1 

0 


33 

8 

1 

0 


34 

8 

1 

0 


35 

11 

1 

0 


36 

11 

1 

0 


37 

12 

1 

0 


38 

12 

1 

0 


39 

15 

1 

0 


40 

17 

1 

0 


41 

22 

1 

0 


42 

23 

1 

0 



As one more example, this time from group 2, per¬ 
son #32 survived 8 weeks and then failed, because 
6 = 1; the X value is 0 because person #32 is in 
group 2. 


VI. Basic Data Layout for 
Understanding Analysis 

For analysis: 


Ordered 


failure 

times 

(©) 

# of 
failures 
(mj) 

#censored in 

h(;> hi+1 )) 
(?,■) 

Risk 

set 

R(hi)) 

t (0) = 0 

m 0 = 0 

1o 

^b(o)) 

bn 

mj 

<h 

w ( n) 

hr) 

m 2 

<?2 

m m ) 

hk) 

m k 

<lk 

R (hk)) 


Ui, 2> • • • > NJ Ceftsored/s 

Unordered Failed t’s 

'— ordered (fyj) 

k = # of distinct times at whick subjects 
failed (k <n) 


We are now ready to look at another data layout, 
which is shown at the left. This layout helps pro¬ 
vide some understanding of how a survival analy¬ 
sis actually works and, in particular, how survivor 
curves are derived. 

The first column in this table gives ordered fail¬ 
ure times. These are denoted by t’s with subscripts 
within parentheses, starting with f(o), then f(p and 
so on, up to f(jt). Note that the parentheses sur¬ 
rounding the subscripts distinguish ordered fail¬ 
ure times from the survival times previously given 
in the computer layout. 


To get ordered failure times from survival times, 
we must first remove from the list of unordered 
survival times all those times that are censored; we 
are thus working only with those times at which 
people failed. We then order the remaining fail¬ 
ure times from smallest to largest, and count ties 
only once. The value k gives the number of distinct 
times at which subjects failed. 
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EXAMPLE 

Remission Data: Group 1 

(n = 21, 9 failures, k = 

-7) 

hi) 

m i 

9, 

*(»</>) 

ho,=° 

0 

0 

21 persons survive > 0 wks 

'<„= 6 

© 

1 

21 persons survive > 6 wks 

t m~ 1 

1 

1 

17 persons survive > 7 wks 


1 

2 

15 persons survive >10 wks 

V 13 

1 

0 

12 persons survive >13 wks 

'<«= 16 

1 

3 

11 persons survive >16 wks 

( (6)= 22 

1 

0 

7 persons survive > 22 wks 

t( 7) = 23 

1 

5 

6 persons survive >23 wks 

Totals 

9 

12 


Remission Data: Group 2 

(n = 21, 21 failures, k 

= 12) 

hi) 


9/ 

R(t u) ) 

( (0)=° 

0 

0 

2 1 persons survive > 0 wks 

'<!>= 1 

2 \ 

0 

21 persons survive > 1 wk 

%) =2 

2 \ 

0 

19 persons survive > 2 wks 

( <3) =3 

f ( 4)= 4 


(ip) 

17 persons survive > 3 wks 

16 persons survive > 4 wks 

( <5) =5 

2 zf 

0 

14 persons survive > 5 wks 

'(6)= 8 

4 jf 

0 

12 persons survive > 8 wks 

t O)~ H 

2 

0 

8 persons survive >11 wks 

h%) = 12 

2' 

0 

6 persons survive >12 wks 

t ( 9 ) = 15 

1 

0 

4 persons survive >15 wks 

? (io) = 12 

1 

0 

3 persons survive >17 wks 

*(ii) = 22 

1 

0 

2 persons survive >22 wks 

t (n) = 23 

1 

0 

1 person survive > 23 wks 

Totals 

21 

0 



For example, using the remission data for group 
1 , we find that nine of the 21 persons failed, in¬ 
cluding three persons at 6 weeks and one person 
each at 7, 10, 13, 16, 22, and 23 weeks. These 
nine failures have k = 7 distinct survival times, 
because three persons had survival time 6 and we 
only count one of these 6’s as distinct. The first 
ordered failure time for this group, denoted as 
f(i), is 6; the second ordered failure time t( 2 ), is 7, 
and so on up to the seventh ordered failure time 
of 23. 

Turning to group 2, we find that although all 
21 persons in this group failed, there are several 
ties. For example, two persons had a survival time 
of 1 week; two more had a survival time of 2 weeks; 
and so on. In all, we find that there were k = 12 dis¬ 
tinct survival times out of the 21 failures. These 
times are listed in the first column for group 2. 

Note that for both groups we inserted a row of 
data giving information at time 0. We will explain 
this insertion when we get to the third column in 
the table. 

The second column in the data layout gives fre¬ 
quency counts, denoted by m, , of those persons 
who failed at each distinct failure time. When 
there are no ties at a certain failure time, then 
nij = 1. Notice that in group 1, there were three 
ties at 6 weeks but no ties thereafter. In group 2, 
there were ties at 1, 2, 4, 5, 8,11, and 12 weeks. In 
any case, the sum of all the m/’s in this column 
gives the total number of failures in the group 
tabulated. This sum is 9 for group 1 and 21 for 
group 2. 
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EXAMPLE (continued) 

qI = censored in [t^y 

i(#+iP 


Remission Data: Group 1 


'</> 

9/ 


«(l<fl) 

l(o)=° 0 

0 

21 persons survive > 0 wks 

'(d= 6 / 3 

*<2)= 7®1 

1 

1 

21 persons survive > 6 wks 

17 persons survive > 7 wks 

? (3) = 10 1 

2 

15 persons survive >10 wks 

'<■.)= 13 1 

0 

12 persons survive >13 wks 

1(5)= 16 1 

3 

11 persons survive >16 wks 

1(6)= 22 1 

0 

7 persons survive > 22 wks 

f (7 ) =23 1 

5 

6 persons survive >23 wks 

Totals 9 

12 



Remission Data: Group 1 


# t(weeks) 

5 

Xfgroup) 

1 

6 

1 

1 

2 

6 

1 

1 

3 

6 

1 

1 

4 

7 

1 

1 

5 

10 

1 

1 

6 

13 

1 

1 

7 

16 

1 

1 

8 

22 

1 

1 

9 

23 

1 

1 

10 

6 

0 

1 

11 

eg 

15 

1 

12 

rio 

0 

1 

13 

In 

0 

1 

14 

17 

0 

1 

15 

19 

0 

1 

16 

20 

0 

1 

17 

25 

0 

1 

18 

32 

0 

1 

19 

32 

0 

1 

20 

34 

0 

1 

21 

35 

0 

1 



The third column gives frequency counts, denoted 
by qj, of those persons censored in the time in¬ 
terval starting with failure time t(j) up to the next 
failure time denoted f(/+i). Technically, because of 
the way we have defined this interval in the table, 
we include those persons censored at the begin¬ 
ning of the interval. 

For example, the remission data, for group 1 in¬ 
cludes 5 nonzero g/s: q i = 1,^2 = 1,(73 = 2y <75 = 
3, q-j = 5. Adding these values gives us the to¬ 
tal number of censored observations for group 1 , 
which is 12. Moreover, if we add the total number 
of q s (12) to the total number of ms (9), we get the 
total number of subjects in group 1 , which is 21 . 

We now focus on group 1 to look a little closer 
at the q s. At the left, we list the unordered group 
1 information followed (on the next page) by the 
ordered failure time information. We will go back 
and forth between these two tables (and pages) as 
we discuss the q s. Notice that in the table here, 
one person, listed as # 10 , was censored at week 6 . 
Consequently, in the table at the top of the next 
page, we have q\ = 1 , which is listed on the sec¬ 
ond line corresponding to the ordered failure time 
fp), which equals 6 . 

The next q is a little trickier, it is derived from the 
person who was listed as #11 in the table here and 
was censored at week 9. Correspondingly, in the 
table at the top of the next page, we have qj = 1 
because this one person was censored within the 
time interval that starts at the second ordered fail¬ 
ure time, 7 weeks, and ends just before the third or¬ 
dered failure time, 10 weeks. We have not counted 
here person # 12 , who was censored at week 10 , 
because this person's censored time is exactly at 
the end of the interval. We count this person in 
the following interval. 















22 1. Introduction to Survival Analysis 


EXAMPLE (continued) 

Group 1 using ordered failure times 

hi) 


1 

^#o)) 


o 

II 

0 

0 

21 persons survive > 0 wks^ 

#i>= 6 

3 

□ 

21 persons survive > 6 wks 

#2)= 7 

1 


1 7 persons survive > 7 wks 

^3) = 

1 

© 

15 persons survive >10 wks 

1(4) = 13 

1 

0 

12 persons survive >13 wks 

1(5) = 16 

1 

3 

11 persons survive >16 wks 

1(6)= 22 

1 

0 

7 persons survive > 22 wks 

1(7) = 23 

1 

5 

6 persons survive >23 wks 

Totals 

9 

12 




We now consider, from the table of unordered 
failure times, person #12 who was censored at 

10 weeks, and person #13, who was censored at 

11 weeks. Turning to the table of ordered failure 
times, we see that these two times are within the 
third ordered time interval, which starts and in¬ 
cludes the 10-week point and ends just before the 
13th week. As for the remaining q s, we will let you 
figure them out for practice. 

One last point about the q information. We in¬ 
serted a row at the top of the data for each group 
corresponding to time 0. This insertion allows for 
the possibility that persons may be censored after 
the start of the study but before the first failure. In 
other words, it is possible that q 0 may be nonzero. 
For the two groups in this example, however, no 
one was censored before the first failure time. 

The last column in the table gives the “risk set.” 
The risk set is not a numerical value or count but 
rather a collection of individuals. By definition, 
the risk set R(ty )) is the collection of individuals 
who have survived at least to time t^y that is, each 
person in R(t(j)) has a survival time that is t(j) or 
longer, regardless of whether the person has failed 
or is censored. 

For example, we see that at the start of the study 
everyone in group 1 survived at least 0 weeks, so 
the risk set at time 0 consists of the entire group of 
21 persons. The risk set at 6 weeks for group 1 also 
consists of all 21 persons, because all 21 persons 
survived at least as long as 6 weeks. These 21 per¬ 
sons include the 3 persons who failed at 6 weeks, 
because they survived and were still at risk just up 
to this point. 
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EXAMPLE (continued) 


hi) 




t(Q) ~ 0 

21 persons survive > 0 wks 

t (l) = 6 

21 persons survive > 6 wks 


1 

1 

CjTpersons survive > 7 wks^> 

%>= 10 

1 

2 

15 persons survive >10 wks 

h 4>= 13 

1 

0 

12 persons survive >13 wks 

%>= 16 

1 

3 

11 persons survive >16 wks 

t {6)= 22 

1 

0 

7 persons survive >22 wks 

t(7) = 23 

1 

5 

6 persons survive >23 wks 

Totals 

9 

12 


ho) = 0 

so 

0/ 

21 persons survive > 0 wks 

t„,= 6 

3\ 

A 

21 persons survive > 6 wks 

‘(2)= 7 

l/ 

\l 

17 persons survive > 7 wks 

‘or 10 

A 

A 

15 persons survive >10 wks 

f w =© 

1 

0 

CjTpersons survive >13 wks^J 

t (5) = 16 

1 

3 

11 persons survive >16 wks 

t (6)= 22 

1 

0 

7 persons survive > 22 wks 

f t7) = 23 

1 

5 

6 persons survive >23 wks 


Totals 


How we work with censored data: 

Use all informaton up to time of cen¬ 
sorship; don’t throw away informa¬ 
tion. 


Now let’s look at the risk set at 7 weeks. This set 
consists of seventeen persons in group 1 that sur¬ 
vived at least 7 weeks. We omit everyone in the 
X-ed area. Of the original 21 persons, we there¬ 
fore have excluded the three persons who failed 
at 6 weeks and the one person who was censored 
at 6 weeks. These four persons did not survive at 
least 7 weeks. Although the censored person may 
have survived longer than 7 weeks, we must ex¬ 
clude him or her from the risk set at 7 weeks be¬ 
cause we have information on this person only up 
to 6 weeks. 

To derive the other risk sets, we must exclude 
all persons who either failed or were censored 
before the start of the time interval being con¬ 
sidered. For example, to obtain the risk set at 
13 weeks for group 1, we must exclude the five 
persons who failed before, but not including, 
13 weeks and the four persons who were censored 
before, but not including, 13 weeks. Subtracting 
these nine persons from 21, leaves twelve persons 
in group 1 still at risk for getting the event at 
13 weeks. Thus, the risk set consists of these twelve 
persons. 

The importance of the table of ordered failure 
times is that we can work with censored obser¬ 
vations in analyzing survival data. Even though 
censored observations are incomplete, in that we 
don't know a person’s survival time exactly, we can 
still make use of the information we have on a 
censored person up to the time we lose track of 
him or her. Rather than simply throw away the 
information on a censored person, we use all the 
information we have on such a person up until 
time of censorship. (Nevertheless, most survival 
analysis techniques require a key assumption that 
censoring is non-informative—censored subjects 
are not at increased risk for failure. See Chapter 9 
on competing risks for further details.) 
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For example, for the three persons in group 1 who 
were censored between the 16th and 20th weeks, 
there are at least 16 weeks of survival information 
on each that we don’t want to lose. These three per¬ 
sons are contained in all risk sets up to the 16th 
week; that is, they are each at risk for getting the 
event up to 16 weeks. Any survival probabilities de¬ 
termined before, and including, 16 weeks should 
make use of data on these three persons as well 
as data on other persons at risk during the first 
16 weeks. 


Having introduced the basic terminology and data 
layouts to this point, we now consider some 
data analysis issues and some additional appli¬ 
cations. 


VII. Descriptive Measures of 
Survival Experience 


EXAMPLE 


Remission times (in weeks) for two 
groups of leukemia patients 


Group 1 

(Treatment) n = 21 

Group 2 
(Placebo) n = 21 

6 , 6, 6, 7, 10, 

1, 1, 2, 2, 3, 

13, 16, 22, 23, 

4, 4, 5, 5, 

6 +, 9+, 10+, 11+, 

8 , 8, 8, 8, 

17+, 19+, 20+, 

11 , 11, 12, 12, 

25+, 32+, 32+, 

34+, 35+ 

15, 17, 22, 23 

T l (ignoring + ’s) = 17.1 

T 2 = 8.6 

^Jr - 025 



a i , A ,-T\ ranu 

Average hazard rate {n)= - 

n 

I h 

i= 1 


We first return to the remission data, again shown 
in untabulated form. Inspecting the survival times 
given for each group, we can see that most of the 
treatment group’s times are longer than most of 
the placebo group’s times. If we ignore the plus 
signs denoting censorship and simply average all 
21 survival times for each group we get an aver¬ 
age, denoted by T “bar,” of 17.1 weeks survival for 
the treatment group and 8.6 weeks for the placebo 
group. Because several of the treatment group’s 
times are censored, this means that group 1 ’s ture 
average is even larger than what we have calcu¬ 
lated. Thus, it appears from the data (without our 
doing any mathematical analysis) that, regarding 
survival, the treatment is more effective than the 
placebo. 

As an alternative to the simple averages that we 
have computed for each group, another descrip¬ 
tive measure of each group is the average hazard 
rate, denoted as h “bar.” This rate is defined by di¬ 
viding the total number of failures by the sum of 
the observed survival times. For group 1, h “bar” 
is 9/359, which equals .025. For group 2, h “bar” 
is 21/182, which equals .115. 
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As previously described, the hazard rate indicates 
failure potential rather than survival probability. 
Thus, the higher the average hazard rate, the lower 
is the group's probability of surviving. 


In our example, the average hazard for the treat¬ 
ment group is smaller than the average hazard for 
the placebo group. 


Placebo hazard > treatment hazard: 
suggests that treatment is more 
effective than placebo 


Thus, using average hazard rates, we again see that 
the treatment group appears to be doing better 
overall than the placebo group; that is, the treat¬ 
ment group is less prone to fail than the placebo 
group. 


Descriptive measures (T and h ) give 
overall comparison; they do not 
give comparison over time. 


The descriptive measures we have used so far—the 
ordinary average and the hazard rate average— 
provide overall comparisons of the treatment 
group with the placebo group. These measures 
don't compare the two groups at different points in 
time of follow-up. Such a comparison is provided 
by a graph of survivor curves. 



Here we present the estimated survivor curves 

for the treatment and placebo groups. The method 
used to get these curves is called the Kaplan- 
Meier method, which is described in Chapter 2. 
When estimated, these curves are actually step 
functions that allow us to compare the treat¬ 
ment and placebo groups over time. The graph 
shows that the survivor function for the treat¬ 
ment group consistently lies above that for the 
placebo group; this difference indicates that the 
treatment appears effective at all points of follow¬ 
up. Notice, however, that the two functions are 
somewhat closer together in the first few weeks of 
follow-up, but thereafter are quite spread apart. 
This widening gap suggests that the treatment is 
more effective later during follow-up than it is 
early on. 
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Median (treatment) = 23 weeks 
Median (placebo) = 8 weeks 


Also notice from the graph that one can obtain 
estimates of the median survival time, the time at 
which the survival probability is .5 for each group. 
Graphically, the median is obtained by proceeding 
horizontally from the 0.5 point on the X-axis un¬ 
til the survivor curve is reached, as marked by an 
arrow, and then proceeding vertically downward 
until the X-axis is crossed at the median survival 
time. 

For the treatment group, the median is 23 weeks; 
for the placebo group, the median is 8 weeks. Com¬ 
parison of the two medians reinforces our previ¬ 
ous observation that the treatment is more effec¬ 
tive overall than the placebo. 


VIII. Example: Extended 
Remission Data 


Group 1 

t (weeks) log WBC 

Group 2 

t (weeks) log WBC 

6 

2.31 

1 

2.80 

6 

4.06 

1 

5.00 

6 

3.28 

2 

4.91 

7 

4.43 

2 

4.48 

10 

2.96 

3 

4.01 

13 

2.88 

4 

4.36 

16 

3.60 

4 

2.42 

22 

2.32 

5 

3.49 

23 

2.57 

5 

3.97 

6 + 

3.20 

8 

3.52 

9+ 

2.80 

8 

3.05 

10 + 

2.70 

8 

2.32 

11 + 

2.60 

8 

3.26 

17+ 

2.16 

11 

3.49 

19+ 

2.05 

11 

2.12 

20 + 

2.01 

12 

1.50 

25+ 

1.78 

12 

3.06 

32+ 

2.20 

15 

2.30 

32+ 

2.53 

17 

2.95 

34+ 

1.47 

22 

2.73 

35+ 

1.45 

23 

1.97 


Before proceeding to another data set, we con¬ 
sider the remission example data (Freireich et ah, 
Blood, 1963) in an extended form. The table at the 
left gives the remission survival times for the two 
groups with additional information about white 
blood cell count for each person studied. In par¬ 
ticular, each person’s log white blood cell count 
is given next to that person’s survival time. The 
epidemiologic reason for adding log WBC to the 
data set is that this variable is usually considered 
an important predictor of survival in leukemia pa¬ 
tients; the higher the WBC, the worse the prog¬ 
nosis. Thus, any comparison of the effects of two 
treatment groups needs to consider the possible 
confounding effect of such a variable. 
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EXAMPLE: CONFOUNDING 


Treatment group: log WBC = 1.8 
Placebo group: log WBC = 4.1 
Indicates confounding of treatment 
effect by log WBC 


Frequency 

distribution 



Need to adjust for imbalance in the 
distribution of log WBC 


Although a full exposition of the nature of con¬ 
founding is not intended here, we provide a sim¬ 
ple scenario to give you the basic idea. Suppose 
all of the subjects in the treatment group had very 
low log WBC, with an average, for example, of 1.8, 
whereas all of the subjects in the placebo group 
had very high log WBC, with an average of 4.1. 
We would have to conclude that the results we've 
seen so far that compare treatment with placebo 
groups may be misleading. 

The additional information on log WBC would 
suggest that the treatment group is surviving 
longer simply because of their low WBC and not 
because of the efficacy of the treatment itself. In 
this case, we would say that the treatment effect 
is confounded by the effect of log WBC. 

More typically, the distribution of log WBC may be 
quite different in the treatment group than in the 
control group. We have illustrated one extreme in 
the graph at the left. Even though such an extreme 
is not likely, and is not true for the data given here, 
the point is that some attempt needs to be made to 
adjust for whatever imbalance there is in the dis¬ 
tribution of log WBC. However, if high log WBC 
count was a consequence of the treatment, then 
white blood cell count should not be controlled 
for in the analysis. 


EXAMPLE: INTERACTION 


High log WBC Low log WBC 



Another issue to consider regarding the effect of 
log WBC is interaction. What we mean by inter¬ 
action is that the effect of the treatment may be 
different, depending on the level of log WBC. For 
example, suppose that for persons with high log 
WBC, survival probabilities for the treatment are 
consistently higher over time than for the placebo. 
This circumstance is illustrated by the first graph 
at the left. In contrast, the second graph, which 
considers only persons with low log WBC, shows 
no difference in treatment and placebo effect over 
time. In such a situation, we would say that there 
is strong treatment by log WBC interaction, and 
we would have to qualify the effect of the treat¬ 
ment as depending on the level of log WBC. 
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Need to consider: 

• interaction; 

• confounding. 


The example of interaction we just gave is but one 
way interaction can occur; on the other hand, in¬ 
teraction may not occur at all. As with confound¬ 
ing, it is beyond our scope to provide a thorough 
discussion of interaction. In any case, the assess¬ 
ment of interaction is something to consider in 
one’s analysis in addition to confounding that in¬ 
volves explanatory variables. 


The problem: 

Compare two groups after adjusting 
for confounding and interaction. 


Thus, with our extended data example, the basic 
problem can be described as follows: to compare 
the survival experience of the two groups after ad¬ 
justing for the possible confounding and/or inter¬ 
action effects of log WBC. 



The problem statement tells us that we are now 
considering two explanatory variables in our ex¬ 
tended example, whereas we previously consid¬ 
ered a single variable, group status. The data lay¬ 
out for the computer needs to reflect the addition 
of the second variable, log WBC. The extended ta¬ 
ble in computer layout form is given at the left. 
Notice that we have labeled the two explanatory 
variables X\ (for group status) and X 2 (for log 
WBC). The variable X\ is our primary study or ex¬ 
posure variable of interest here, and the variable 
A 2 is an extraneous variable that we are interested 
in accounting for because of either confounding or 
interaction. 
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Analysis alternatives: 

• stratify on log WBC; 

• use math modeling, e.g., 
proportional hazards model. 


As implied by our extended example, which con¬ 
siders the possible confounding or interaction ef¬ 
fect of log WBC, we need to consider methods for 
adjusting for log WBC and/or assessing its effect in 
addition to assessing the effect of treatment group. 
The two most popular alternatives for analysis are 
the following: 

• to stratify on log WBC and compare survival 
curves for different strata; or 

• to use mathematical modeling procedures 
such as the proportional hazards or other sur¬ 
vival models; such methods will be described 
in subsequent chapters. 


IX. Multivariable Example 

• Describes general multivariable 
survival problem. 

• Gives analogy to regression 
problems. 


We now consider one other example. Our purpose 
here is to describe a more general type of mul¬ 
tivariable survival analysis problem. The reader 
may see the analogy of this example to multiple 
regression or even logistic regression data prob¬ 
lems. 


EXAMPLE 


13-year follow-up of fixed cohort from 
Evans County, Georgia 

n = 170 white males (60+) 

T = years until death 
Event = death 

Explanatory variables: 

• exposure variable 

• confounders 

• interaction variables 

Exposure: 

Social Network Index (SNI) 


0 1 2 3 4 5 

Absence Excellent 

of social social 

network network 


We consider a data set developed from a 13-year 
follow up study of a fixed cohort of persons in 
Evans County Georgia, during the period 1967— 
1980 (Schoenbach et ah, Amer. J. Epid., 1986). 
From this data set, we focus or a portion contain¬ 
ing n = 170 white males who are age 60 or older 
at the start of follow-up in 1967. 

For this data set, the outcome variable is T , time 
in years until death from start of follow-up, so 
the event of interest is death. Several explanatory 
variables are measured, one of which is considered 
the primary exposure variable; the other variables 
are considered as potential confounders and/or in¬ 
teraction variables. 

The primary exposure variable is a measure called 
Social Network Index (SNI). This is an ordinal 
variable derived from questionnaire measurement 
and is designed to assess the extent to which a 
study subject has social contacts of various types. 
With the questionnaire, a scale is used with values 
ranging from 0 (absence of any social network) to 
5 (excellent social network). 
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EXAMPLE (continued) 


Study goal: to determine whether SNI is 
protective against death, 
i.e., SNI => S(t) S 


Explanatory variables: 


SNI 

AGE 

SBP 

CHR 

QUET 

SOCL 


Note : QUET = 


Exposure variable 


Potential confounders/ 
interaction variables 


weight 

(height) 2 


x 100 


The problem: 

To describe the relationship between 
SNI and time to death, after 
controlling for AGE, SBP, CHR, 
QUET, and SOCL. 


Goals: 

• Measure of effect (adjusted) 

• Survivor curves for different SNI 
categories (adjusted) 

• Decide on variables to be adjusted; 
determine method of adjustment 


The study's goal is to determine whether one’s 
social network, as measured by SNI, is protec¬ 
tive against death. If this study hypothesis is cor¬ 
rect, then the higher the social network score, the 
longer will be one's survival time. 

In evaluating this problem, several explanatory 
variables, in addition to SNI, are measured at the 
start of follow-up. These include AGE, systolic 
blood pressure (SBP), an indicator of the presence 
or absence of some chronic disease (CHR), body 
size as measured by Quetelet's index (QUET = 
weight over height squared times 100), and social 
class (SOCL). 

These five additional variables are of interest be¬ 
cause they are thought to have their own special 
or collective influence on how long a person will 
survive. Consequently, these variables are viewed 
as potential confounders and/or interaction vari¬ 
ables in evaluating the effect of social network on 
time to death. 

We can now clearly state the problem being ad¬ 
dressed by this study: To describe the relationship 
between SNI and time to death, controlling for 
AGE, SBP, CHR, QUET, and SOCL. 

Our goals in using survival analysis to solve this 
problem are as follows: 

• to obtain some measure of effect that will de¬ 
scribe the relationship between SNI and time 
until death, after adjusting for the other vari¬ 
ables we have identified; 

• to develop survival curves that describe the 
probability of survival over time for different 
categories of social networks; in particular, we 
wish to compare the survival of persons with 
excellent networks to the survival of persons 
with poor networks. Such survival curves need 
to be adjusted for the effects of other variables. 

• to achieve these goals, two intermediary goals 
are to decide which of the additional variables 
being considered need to be adjusted and to 
determine an appropriate method of adjust¬ 
ment. 
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The computer data layout for this problem is given 
below. The first column lists the 170 individuals 
in the data set. The second column lists the sur¬ 
vival times, and the third column lists failure or 
censored status. The remainder of the columns 
list the 6 explanatory variables of interest, start¬ 
ing with the exposure variable SNI and continu¬ 
ing with the variables to be accounted for in the 
analysis. 


Computer layout: 13-year follow-up study (1967-1980) of a fixed cohort of n = 170 
white males (60+) from Evans County, Georgia 


# 

t 

6 

SNI 

AGE 

SBP 

CHR 

QUET 

SOCL 

1 

t\ 

61 

SNI, 

AGEi 

SBP! 

CHR! 

QUETi 

SOCL, 

2 

ti 

62 

SNI 2 

age 2 

sbp 2 

chr 2 

quet 2 

socl 2 

170 

1 170 

6170 

SNI™ 

AGEno 

SBP170 

CHR 170 

QUETi 70 

SOCL170 


X. Math Models in Survival 
Analysis 

General framework 

E D 

Controlling for C\, C 2 , ... C p . 

SNI study: 

E = SNI » D = survival time 

Controlling for AGE, SBP, CHR, 
QUET, and SOCL 


It is beyond the scope of this presentation to pro¬ 
vide specific details of the survival analysis of 
these data. Nevertheless, the problem addressed 
by these data is closely analogous to the typical 
multivariable problem addressed by linear and lo¬ 
gistic regression modeling. Regardless of which 
modeling approach is chosen, the typical problem 
concerns describing the relationship between an 
exposure variable (e.g., E) and an outcome vari¬ 
able (e.g ,,D) after controlling for the possible con¬ 
founding and interaction effects of additional vari¬ 
ables (e.g., Ci, C 2 , and so on up to C p ). In our 
survival analysis example, E is the social network 
variable SNI, D is the survival time variable, and 
there are p = 5 C variables, namely, AGE, SBP, 
CHR, QUET, and SOCL. 
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Model Outcome 



o 


Survival analysis 


Time to event 
(with censoring) 


Linear regression 


Continuous (SBP) 


Logistic regression 


Dichotomous 
(CHD yes/no) 


Nevertheless, an important distinction among 
modeling methods is the type of outcome vari¬ 
able being used. In survival analysis, the outcome 
variable is “time to an event,” and there may be 
censored data. In linear regression modeling, the 
outcome variable is generally a continuous vari¬ 
able, like blood pressure. In logistic modeling, the 
outcome variable is a dichotomous variable, like 
CHD status, yes or no. And with linear or logistic 
modeling, we usually do not have information on 
follow-up time available. 


As with linear and logistic modeling, one statisti¬ 
cal goal of a survival analysis is to obtain some 
measure of effect that describes the exposure- 
outcome relationship adjusted for relevant extra¬ 
neous variables. 


Measure of effect: 

Linear regression: 

regression coefficient [3 
Logistic regression: 
odds ratio e f ’’ 


Survival analysis: 
hazard ratio e /! 


In linear regression modeling, the measure of ef¬ 
fect is usually some regression coefficient (3. 

In logistic modeling, the measure of effect is an 
odds ratio expressed in terms of an exponential of 
one or more regression coefficients in the model, 
for example, e to the |3. 

In survival analysis, the measure of effect typically 
obtained is called a hazard ratio; as with the logis¬ 
tic model, this hazard ratio is expressed in terms 
of an exponential of one or more regression coef¬ 
ficients in the model. 


EXAMPLE 


SNI study: hazard ratio (HR) describes 
relationship between SNI and T, after 
controlling for covariates. 


Thus, from the example of survival analysis mod¬ 
eling of the social network data, one may obtain 
a hazard ratio that describes the relationship be¬ 
tween SNI and survival time (T), after controlling 
for the appropriate covariates. 
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Interpretation of HR (like OR): 

HR = 1 no relationship 

HR = 10 =>• exposed hazard 
10 times unexposed 

HR = 1/10 =$■ exposed hazard 
1/10 times unexposed 


The hazard ratio, although a different measure 
from an odds ratio, nevertheless has a similar in¬ 
terpretation of the strength of the effect. A haz¬ 
ard ratio of 1, like an odds ratio of 1, means that 
there is no effect; that is, 1 is the null value for 
the exposure-outcome relationship. A hazard ra¬ 
tio of 10, on the other hand, is interpreted like an 
odds ratio of 10; that is, the exposed group has ten 
times the hazard of the unexposed group. Simi¬ 
larly, a hazard ratio of 1/10 implies that the ex¬ 
posed group has one-tenth the hazard of the un¬ 
exposed group. 


Chapters 

S 1. (introduction) 

2. Kaplan-Meier Survival Curves 
and the Log-Rank Test 


This presentation is now complete. We suggest 
that you review the material covered here by read¬ 
ing the detailed outline that follows. Then do the 
practice exercises and test. 

In Chapter 2 we describe how to estimate and 
graph survival curves using the Kaplan-Meier 
(KM) method. We also describe how to test 
whether two or more survival curves are estimat¬ 
ing a common curve. The most popular such test 
is called the log-rank test. 
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Detailed 

Outline 


I. What is survival analysis? (pages 4-5) 

A. Type of problem addressed: outcome variable is 
time until an event occurs. 

B. Assume one event of interest; more than one type 
of event implies a competing risk problem. 

C. Terminology: time = survival time; event = failure. 

D. Examples of survival analysis: 

i. leukemia patients/time in remission 

ii. disease-free cohort/time until heart disease 

iii. elderly population/time until death 

iv. parolees/time until rearrest (recidivism) 

v. heart transplants/time until death 

II. Censored data (pages 5-8) 

A. Definition: don’t know exact survival time. 

B. Reasons: study ends without subject getting event; 
lost to follow-up; withdraws. 

C. Examples of survival data for different persons; 
summary table. 

III. Terminology and notation (pages 8-14) 

A. Notation: T = survival time random variable: 

t = specific value for T 
8 = (0-1) variable for failure/censorship 
status 

B. Terminology: S(t) = survivor function 

h(t) = hazard function 

C. Properties of survivor function: 

• theoretically, graph is smooth curve, decreasing 
from S(t ) = 1 at time t = 0 to S(t) = 0 at t = oo; 

• in practice, graph is step function that may not 
go all the way to zero at end of study if not 
everyone studied gets the event. 

D. Hazard function formula: 

«,) = lim P«^<> + A,|r>,) 

At^O At 

E. Hazard function properties: 

• h{t) gives instantaneous potential for event to 
occur given survival up to time t; 

• instantaneous potential idea is illustrated by 
velocity; 

• hazard function also called “conditional failure 
rate”; 

• h(t) > 0; has no upper bound; not a probability; 
depends on time units. 
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F. Examples of hazard curves: 

i. exponential 

ii. increasing Weibull 

iii. decreasing Weibull 

iv. log normal 

G. Uses of hazard function: 

• gives insight about conditional failure rates; 

• identifies specific model form; 

• math model for survival analysis is usually 
written in terms of hazard function. 

H. Relationship of S(t) to h(t): if you know one, you 
can determine the other. 

• example: h{t) = X if and only if S(t) = e~ u 

• general formulae: 



dS(t)/dt~ 


IV. Goals of survival analysis (page 15) 

A. Estimate and interpret survivor and/or hazard 
functions. 

B. Compare survivor and/or hazard functions. 

C. Assess the relationship of explanatory variables 
to survival time. 

V. Basic data layout for computer (15-19) 

A. General layout: 


# t 6 X x X 2 X p 


1 t\ 5i An X\2”‘X\ P 

2 t 2 b 2 X 2 i X 22 ■ • • X 2 p 


1 tj 5 j Xji X j2 ---Xj P 



B. Example: Remission time data 










36 1. Introduction to Survival Analysis 


VI. Basic data layout for understanding analysis 

(pages 19-24) 

A. General layout: 


Ordered 


failure 

times 

(*(/)) 

# of 
failures 

( m j) 

# censored 
in [f(/),t ( ;+ 1 )) 
(<?/) 

Risk 

set 

*(!(/)) 

O 

II 

o' 

mo = 0 

Qo 

o)) 

hi) 

mi 

q i 

R(h d) 

1(2) 

mi 

<72 

R(t(i)) 

t(k) 

m k 

qk 

R(t(k)) 


Note: k = # of distinct times at which subjects 
failed; n = # of subjects (k < n); R{t(j)), the risk 
set, is the set of individuals whose survival times 
are at least f(/) or larger. 

B. Example: Remission time data 
Group 1 (« = 21, 9 failures, k = 7); 

Group 2 (n= 21, 21 failures, k = 12) 

C. How to work with censored data: 

Use all information up to the time of censorship; 
don’t throw away information. 

VII. Descriptive measures of survival experience 
(pages 24-26) 

A. Average survival time (ignoring censorship 
status): 


Eh 

/=i 
n 


B. Average hazard rate: 

y # failures 
n = - 

n 

Eh 

7=1 

C. Descriptive measures T and h give overall 
comparison; estimated survivor curves give 
comparison over time. 


T underestimates the true average 
survival time, because censored 
times are included in the formula. 
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D. Estimated survivor curves are step function 
graphs. 

E. Median survival time: graphically, proceed 
horizontally from 0.5 on the E-axis until 
reaching graph, then vertically downward until 
reaching the X-axis. 

VIII. Example: Extended remission data (pages 26-29) 

A. Extended data adds log WBC to previous 
remission data. 

B. Need to consider confounding and interaction. 

C. Extended data problem: compare survival 
experience of two groups, after adjusting for 
confounding and interaction effects of log WBC. 

D. Analysis alternatives: 

i. stratify on log WBC and compare survival 
curves for different strata; 

ii. use math modeling, e.g., proportional 
hazards model. 

IX. Multivariable example (pages 29-31) 

A. The problem: to describe the relationship 
between social network index (SNI) and time 
until death, controlling for AGE, systolic blood 
pressure (SBP), presence or absence of chronic 
disease (CHR), Quetelet’s index (QUET—a 
measure of body size), and social class (SOCL). 

B. Goals: 

• to obtain an adjusted measure of effect; 

• to obtain adjusted survivor curves for different 
SNI categories; 

• to decide on variables to be adjusted. 

C. The data: 13-year follow-up study (1967-1980) of 
a fixed cohort of n = 170 white males (60+) from 
Evans County, Georgia. 


# 

t 

6 

SNI 

AGE 

SBP 

CHR 

QUET 

SOCL 

1 

h 

6 i 

SNI, 

AGEJ 

SBP, 

CHRj 

QUETi 

SOCLi 

2 

h 

&2 

sni 2 

age 2 

sbp 2 

chr 2 

quet 2 

socl 2 

170 

fl 70 

dl 70 

sni 170 

AGE i7o 

SBP no 

CHRno 

QUET no 

SOCL)7o 
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1 


Introduction to Survival Analysis 


X. Math models in survival analysis (pages 31-33) 

A. Survival analysis problem is analogous to typical 
multivariable problem addressed by linear 
and/or logistic regression modeling: describe 
relationship of exposure to outcome, after 
accounting for possible confounding and 
interaction. 

B. Outcome variable (time to event) for survival 
analysis is different from linear (continuous) or 
logistic (dichotomous) modeling. 

C. Measure of effect typically used in survival 
analysis: hazard ratio (HR). 

D. Interpretation of HR: like OR. SNI study: HR 
describes relationship between SNI and T , after 
controlling for covariates. 


Practice 

Exercises 


True or False (Circle T or F): 


T F 1. 
T F 2. 
T F 3. 
T F 4. 

T F 5. 
T F 6. 
T F 7. 

T F 8. 

T F 9. 
T F 10. 
T F 11. 
T F 12. 


In a survival analysis, the outcome variable is di¬ 
chotomous. 

In a survival analysis, the event is usually de¬ 
scribed by a (0,1) variable. 

If the study ends before an individual has gotten 
the event, then his or her survival time is censored. 
If, for a given individual, the event occurs before 
the person is lost to follow-up or withdraws from 
the study, then this person’s survival time is cen¬ 
sored. 

S(t) = P{T > t) is called the hazard function. 
The hazard function is a probability. 
Theoretically, the graph of a survivor function is 
a smooth curve that decreases from S(t) = 1 at 
t = 0 to S(t) = 0 at t = oo. 

The survivor function at time t gives the instanta¬ 
neous potential per unit time for a failure to occur, 
given survival up to time t. 

The formula for a hazard function involves a con¬ 
ditional probability as one of its components. 
The hazard function theoretically has no upper 
bound. 

Mathematical models for survival analysis are fre¬ 
quently written in terms of a hazard function. 
One goal of a survival analysis is to compare sur¬ 
vivor and/or hazard functions. 
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T F 13. Ordered failure times are censored data. 

T F 14. Censored data are used in the analysis of survival 
data up to the time interval of censorship. 

T F 15. A typical goal of a survival analysis involving sev¬ 
eral explanatory variables is to obtain an adjusted 
measure of effect. 

16. Given the following survival time data (in weeks) for 
n = 15 subjects, 

1, 1, 1+, 1+. 1+, 2, 2, 2, 2+, 2+, 3, 3, 3+, 4+, 5+ 

where + denotes censored data, complete the following 
table: 

kn m i <n _Afro-)) 

0 0 0 15 persons survive > 0 weeks 

1 
2 
3 

Also, compute the average survival time (T) and the aver¬ 
age hazard rate (h) using the raw data (ignoring + signs 
for T). 

17. Suppose that the estimated survivor curve for the above 
table is given by the following graph: 

l- 

Sit) 


0 12 3 

t 

What is the median survival time for this cohort? 


Questions 18-20 consider the comparison of the 
following two survivor curves: 


sit) 


- - Group B 


Group A 


18. Which group has a better survival prognosis before time 




















40 1. Introduction to Survival Analysis 


19. Which group has a better survival prognosis after 
time t*l 

20. Which group has a longer median survival time? 


Test 


True or False (Circle T or F): 


T F 1. 

T F 2. 
T F 3. 

T F 4. 

T F 5. 
T F 6. 

T F 7. 

T F 8. 

T F 9. 
T F 10. 

T F 11. 

T F 12. 

T F 13. 
T F 14. 


Survival analysis is a collection of statistical pro¬ 
cedures for data analysis for which the outcome 
variable is time until an event occurs. 

In survival analysis, the term “event” is synony¬ 
mous with “failure.” 

If a given individual is lost to follow-up or with¬ 
draws from the study before the end of the study 
without the event occurring, then the survival 
time for this individual is said to be “censored.” 
In practice, the survivor function is usually 
graphed as a smooth curve. 

The survivor function ranges between 0 and oo. 
The concept of instantaneous potential is illus¬ 
trated by velocity. 

A hazard rate of one per day is equivalent to seven 
per week. 

If you know the form of a hazard function, then 
you can determine the corresponding survivor 
curve, and vice versa. 

One use of a hazard function is to gain insight 
about conditional failure rates. 

If the survival curve for group 1 lies completely 
above the survival curve for group 2, then the me¬ 
dian survival time for group 2 is longer than that 
for group 1. 

The risk set at six weeks is the set of individu¬ 
als whose survival times are less than or equal to 
six weeks. 

If the risk set at six weeks consists of 22 persons, 
and four persons fail and three persons are cen¬ 
sored by the 7th week, then the risk set at seven 
weeks consists of 18 persons. 

The measure of effect used in survival analysis is 
an odds ratio. 

If a hazard ratio comparing group 1 relative to 
group 2 equals 10, then the potential for failure is 
ten times higher in group 1 than in group 2. 
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T F 15. The outcome variable used in a survival analy¬ 
sis is different from that used in linear or logistic 
modeling. 

16. State two properties of a hazard function. 

17. State three reasons why hazard functions are used. 

18. State three goals of a survival analysis. 

19. The following data are a sample from the 1967-1980 
Evans County study. Survival times (in years) are given 
for two study groups, each with 25 participants. Group 1 
has no history of chronic disease (CHR = 0), and group 
2 has a positive history of chronic disease (CHR =1): 

Group 1 (CHR = 0): 12.3+, 5.4, 8.2, 12.2+, 11.7, 10.0, 

5.7, 9.8, 2.6, 11.0, 9.2, 12.1+, 6.6, 

2.2, 1.8, 10.2, 10.7, 11.1, 5.3, 3.5, 

9.2, 2.5, 8.7, 3.8, 3.0 

Group 2 (CHR= 1): 5.8, 2.9, 8.4, 8.3, 9.1, 4.2, 4.1, 1.8, 

3.1, 11.4,2.4, 1.4, 5.9, 1.6, 2.8, 
4.9, 3.5, 6.5, 9.9, 3.6, 5.2, 8.8, 7.8, 

4.7, 3.9 

For group 1, complete the following table involving 
ordered failure times: 

_ ho m i <n R (ho) _ 

Group 1: 0.0 0 0 25 persons survived > 0 years 

1.8 1 0 25 persons survived > 1.8 years 

2.2 

2.5 

2.6 
3.0 

3.5 

3.8 

5.3 

5.4 

5.7 

6.6 

8.2 

8.7 

9.2 

9.8 
10.0 

10.2 

10.7 
11.0 
11.1 

11.7 
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20. For the data of Problem 19, the average survival time 
(r) and the average hazard rate (h) for each group are 
given as follows: 

T h 

Group 1: 7.5 .1165 

Group 2: 5.3 .1894 

a. Based on the above information, which group has a 
better survival prognosis? Explain briefly. 

b. How would a comparison of survivor curves provide 
additional information to what is provided in the 
above table? 


Answers to 

Practice 

Exercises 


1. F: the outcome is continuous; time until an event occurs. 

2. T 

3. T 

4. F: the person fails, i.e., is not censored. 

5. F: S(f) is the survivor function. 

6. F: the hazard is a rate, not a probability. 

7. T 

8. F: the hazard function gives instantaneous potential. 

9. T 

10. T 

11. T 

12. T 

13. F: ordered failure times are data for persons who are 
failures. 

14. T 


15. 


T 



Answers to Practice Exercises 
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Hi) 


?; 

R (ki )) 

0 

0 

0 

15 persons survive > 0 weeks 

1 

2 

3 

15 persons survive > 1 weeks 

2 

3 

2 

10 persons survive > 2 weeks 

3 

2 

3 

5 persons survive > 3 weeks 


— 33 - 7 

T = — = 2.2; h = — = 0.2121 
15 33 

17. Median = 3 weeks 

18. Group A 

19. Group B 

20. Group A 




Kaplan 

Meier 



Survival 
Curves and 
the Log- 
Rank Test 
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Introduction 


Abbreviated 

Outline 


We begin with a brief review of the purposes of survival analy¬ 
sis, basic notation and terminology, and the basic data layout 
for the computer. 

We then describe how to estimate and graph survival curves 
using the Kaplan-Meier (KM) method. The estimated sur¬ 
vival probabilities are computed using a product limit 
formula. 

Next, we describe how to compare two or more survival 
curves using the log-rank test of the null hypothesis of a 
common survival curve. For two groups, the log-rank statis¬ 
tic is based on the summed observed minus expected score for 
a given group and its variance estimate. For several groups, 
a computer should always be used because the log-rank for¬ 
mula is more complicated mathematically. The test statistic 
is approximately chi-square in large samples with G — 1 de¬ 
grees of freedom, where G denotes the number of groups be¬ 
ing compared. 

Several alternatives to the log-rank test will be briefly de¬ 
scribed. These tests are variations of the log rank test that 
weigh each observation differently. They are also large sam¬ 
ple chi-square tests with G — 1 degrees of freedom. 


The outline below gives the user a preview of the material to 
be covered by the presentation. A detailed outline for review 
purposes follows the presentation. 

I. Review (pages 48-50) 

II. An example of Kaplan-Meier curves (pages 51-55) 

III. General features of KM curves (pages 56-57) 

IV. The log-rank test for two groups (pages 57-61) 

V. The log-rank test for several groups (pages 61-63) 

VI. Alternatives to the log-rank test (pages 63-68) 

VII. Summary (page 68) 
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Objectives 


Upon completing the chapter, the learner should be able to: 

1. Compute Kaplan-Meier (KM) probabilities of survival, 
given survival time and failure status information on a 
sample of subjects. 

2. Interpret a graph of KM curves that compare two or more 
groups. 

3. Draw conclusions as to whether or not two or more sur¬ 
vival curves are the same based on computer results that 
provide a log-rank test and/or an alternative test. 

4. Decide whether the log-rank test or one of the alternatives 
to this test is more appropriate for a given set of survival 
data. 
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Presentation 



This presentation describes how to plot and inter¬ 
pret survival data using Kaplan-Meier (KM) sur¬ 
vival curves and how to test whether or not two 
or more KM curves are equivalent using the log- 
rank test. We also describe alternative tests to the 
log-rank test. 


I. Review 

Start TIME Event 

Event: death 
disease 
relapse 


Time = survival time 
Event = failure 

Censoring: Don’t know survival 
time exactly 


True survival time 


Observed survival time 
Right-censored - 


We begin by reviewing the basics of survival anal¬ 
ysis. Generally, survival analysis is a collection of 
statistical procedures for the analysis of data in 
which the outcome variable of interest is time 
until an event occurs. By event, we mean death, 
disease incidence, relapse from remission, or any 
designated experience of interest that may happen 
to an individual. 

When doing a survival analysis, we usually refer 
to the time variable as survival time. We also typ¬ 
ically refer to the event as a failure. 

Most survival analyses consider a key data analyt¬ 
ical problem called censoring. In essence, censor¬ 
ing occurs when we have some information about 
individual survival time, but we don’t know the 
survival time exactly. 

Most survival time data is right-censored, because 
the true survival time interval, which we don’t re¬ 
ally know, has been cut off (i.e., censored) at the 
right side of the observed time interval, giving us 
an observed survival time that is shorter than the 
true survival time. We want to use the observed 
survival time to draw implications about the true 
survival time. 


NOTATION 


T = survival time 

\ 

' random variable 
t = specific value for T 


As notation, we denote by a capital T the random 
variable for a person’s survival time. Next, we de¬ 
note by a small letter t any specific value of inter¬ 
est for the variable T. 
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6 = (0, 1) random variable 


| 1 if failure 
| 0 if censored 


S(t ) = survivor function 
= Pr(J > t) 


We let the Greek letter delta (6) denote a (0,1) ran¬ 
dom variable indicating either censorship or fail¬ 
ure. A person who does not fail, that is, does not 
get the event during the study period, must have 
been censored either before or at the end of the 
study. 

The survivor function, denoted by S(t), gives the 
probability that the random variable T exceeds the 
specified time t. 


s( 0 ) 



0 t Study end 


Theoretically, as t ranges from 0 up to infinity, 
the survivor function is graphed as a decreasing 
smooth curve, which begins at S(t) = 1 at t = 0 
and heads downward toward zero as t increases 
toward infinity. 


In practice, using data, we usually obtain esti¬ 
mated survivor curves that are step functions, as 
illustrated here, rather than smooth curves. 


h(t) = hazard functon 

= instantaneous potential 
given survival up to time t 


The hazard function, denoted by h(t), gives the in¬ 
stantaneous potential per unit time for the event 
to occur given that the individual has survived up 
to time t. 


Not failing 

h(t) 




Failing 

h(t) is a rate: 

0 to OO 


In contrast to the survivor function, which focuses 
on not failing, the hazard function focuses on fail¬ 
ing; in other words, the higher the average hazard, 
the worse the impact on survival. The hazard is a 
rate, rather than a probability. Thus, the values 
of the hazard function range between zero and 
infinity. 



Regardless of which function S(t) or h(t) one 
prefers, there is a clearly defined relationship 
between the two. In fact, if one knows the form 
of 5(t), one can derive the corresponding h(t), and 
vice versa. 
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General Data Layout: 


Indiv. # 

t 

6 

Xi 

x 2 ■ 

■■Xp 

1 

t\ 

6i 

X n 

X 12 ■ 

■X lp 

2 

h 

6 2 

X 21 

x 22 ■ 

■■X 2p 


n t n 

b n 

Xn 1 X n2 

' X-np 

Alternative (ordered) data 
layout: 


Ordered 

# of 

# censored in 

Risk 

failure times, 

failures 

h(j)’k;+l))> 

set, 

Hi) 

my 

9/ 


f (0) = 0 

mo = 0 

90 

«(fo)) 

kD 

mi 

91 

#(ku) 

Hd 

m 2 

92 

RiHn) 

Hk) 

m 

9* 

R (t(k )) 


The general data layout for a survival analysis is 
given by the table shown here. The first column of 
the table identifies the study subjects. The second 
column gives the observed survival time informa¬ 
tion. The third column gives the information for 
6, the dichotomous variable that indicates censor¬ 
ship status. The remainder of the information in 
the table gives values for explanatory variables of 
interest. 


An alternative data layout is shown here. This lay¬ 
out is the basis upon which Kaplan-Meier sur¬ 
vival curves are derived. The first column in the 
table gives ordered survival times from smallest to 
largest. The second column gives frequency counts 
of failures at each distinct failure time. The third 
column gives frequency counts, denoted by qj, of 
those persons censored in the time interval start¬ 
ing with failure time fq) up to but not including 
the next failure time, denoted by fq+q. The last 
column gives the risk set, which denotes the col¬ 
lection of individuals who have survived at least 
to time t(j). 


Table of ordered failures: 

• Uses all information up to time 
of censorship; 

• S{t) is derived from R(t). 


To estimate the survival probability at a given time, 
we make use of the risk set at that time to include 
the information we have on a censored person up 
to the time of censorship, rather than simply throw 
away all the information on a censored person. 


Survival probability: 

Use Kaplan-Meier (KM) 

method. 


The actual computation of such a survival proba¬ 
bility can be carried out using the Kaplan-Meier 
(KM) method. We introduce the KM method in the 
next section by way of an example. 
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II. An Example of 

Kaplan-Meier Curves 



The data for this example derive from a study of 
the remission times in weeks for two groups of 
leukemia patients, with 21 patients in each group. 
Group 1 is the treatment group and group 2 is 
the placebo group. The basic question of interest 
concerns comparing the survival experience of the 
two groups. 


Of the 21 persons in group 1, 9 failed during the 
study period and 12 were censored. In contrast, 
none of the data in group 2 are censored; that is, 
all 21 persons in the placebo group went out of 
remission during the study period. 

In Chapter 1, we observed for this data set that 
group 1 appears to have better survival prognosis 
than group 2, suggesting that the treatment is ef¬ 
fective. This conclusion was supported by descrip¬ 
tive statistics for the average survival time and 
average hazard rate shown. Note, however, that 
descriptive statistics provide overall comparisons 
but do not compare the two groups at different 
times of follow-up. 
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EXAMPLE (continued) 

Ordered failure times: 


Group 1 (treatment) 



hi) 


n i 

m.j 

Vi 

0 


21 

0 

0 

6 


21 

3 

1 

7 


17 

1 

1 

10 


15 

t 

2 

13 


12 

1 

0 

16 


11 

1 

3 

22 


7 

1 

0 

23 


6 

1 

5 

>23 


— 

— 

— 

Group 2 (placebo) 



Hi) 


n i 

my 

Vi 

0 


21 

0 

0 

1 


21 

2 

0 

2 


19 

2 

0 

3 


17 

1 

0 

4 


16 

2 

0 

5 


14 

2 

0 

8 


12 

4 

0 

11 


8 

2 

0 

12 


6 

2 

0 

15 


4 

1 

0 

17 


3 

t 

0 

22 


2 

1 

0 

23 


i 

1 

0 

Group 2: no censored subjects 

Group 2 (placebo) 



f 0) 

n. 

1 

m. 

j 

q i 

&</>) 

0 

21 

0 

0 

1 

1 

21 

2 

0 

19/21 = .90 

2 

19 

2 

0 

17/21 = .81 

3 

17 

1 

0 

16/21 = .76 

4 

16 

2 

0 

14/21 = .67 

5 

14 

2 

0 

12/21 = .57 

8 

12 

4 

0 

8/21 = .38 

11 

8 

2 

0 

6/21 = .29 

12 

6 

2 

0 

4/21 = .19 

15 

4 

1 

0 

3/21 = .14 

17 

3 

1 

0 

2/21 = .10 

22 

2 

1 

0 

1/21 = .05 

23 

1 

1 

0 

0/21 = .00 



A table of ordered failure times is shown here for 
each group. These tables provide the basic infor¬ 
mation for the computation of KM curves. 

Each table begins with a survival time of zero, even 
though no subject actually failed at the start of 
follow-up. The reason for the zero is to allow for 
the possibility that some subjects might have been 
censored before the earliest failure time. 

Also, each table contains a column denoted as n ; 
that gives the number of subjects in the risk set at 
the start of the interval. Given that the risk set is 
defined as the collection of individuals who have 
survived at least to time f( ; ), it is assumed that n ; - 
includes those persons failing at time tyy In other 
words, n, counts those subjects at risk for failing 
instantaneously prior to time f( ; ). 


We now describe how to compute the KM curve 
for the table for group 2. The computations for 
group 2 are quite straightforward because there 
are no censored subjects for this group. 

The table of ordered failure times for group 2 
is presented here again with the addition of an¬ 
other column that contains survival probability 
estimates. These estimates are the KM survival 
probabilities for this group. We will discuss the 
computations of these probabilities shortly. 
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EXAMPLE (continued) 



A plot of the KM survival probabilities correspond¬ 
ing to each ordered failure time is shown here for 
group 2. Empirical plots such as this one are typ¬ 
ically plotted as a step function that starts with a 
horizontal line at a survival probability of 1 and 
then steps down to the other survival probabili¬ 
ties as we move from one ordered failure time to 
another. 

We now describe how the survival probabilities for 
the group 2 data are computed. Recall that a sur¬ 
vival probability gives the probability that a study 
subject survives past a specified time. 


Thus, considering the group 2 data, the probabil¬ 
ity of surviving past zero is unity, as it will always 
be for any data set. 

Next, the probability of surviving past the first or¬ 
dered failure time of one week is given by 19/21 or 
(.90) because 2 people failed at one week, so that 
19 people from the original 21 remain as survivors 
past one week. 

Similarly, the next probability concerns subjects 
surviving past two weeks, which is 17/21 (or .81) 
because 2 subjects failed at one week and 2 sub¬ 
jects failed at two weeks leaving 17 out of the orig¬ 
inal 21 subjects surviving past two weeks. 

The remaining survival probabilities in the table 
are computed in the same manner, that is, we 
count the number of subjects surviving past the 
specified time being considered and divide this 
number by 21, the number of subjects at the start 
of follow-up. 

Recall that no subject in group 2 was censored, so 
the q column for group 2 consists entirely of ze¬ 
ros. If some of the q’s had been nonzero, an alter¬ 
native formula for computing survival probabili¬ 
ties would be needed. This alternative formula is 
called the Kaplan-Meier (KM) approach and can 
be illustrated using the group 2 data even though 
all values of q are zero. 
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For example, an alternative way to calculate the 
survival probability of exceeding four weeks for 
the group 2 data can be written using the KM for¬ 
mula shown here. This formula involves the prod¬ 
uct of conditional probability terms. That is, each 
term in the product is the probability of exceed¬ 
ing a specific ordered failure time f( ; -) given that a 
subject survives up to that failure time. 

Thus, in the KM formula for survival past four 
weeks, the term 19/21 gives the probability of sur¬ 
viving past the first ordered failure time, one week, 
given survival up to the first week. Note that all 21 
persons in group 2 survived up to one week, but 
that 2 failed at one week, leaving 19 persons sur¬ 
viving past one week. 

Similarly, the term 16/17 gives the probability of 
surviving past the third ordered failure time at 
week 3, given survival up to week 3. There were 
17 persons who survived up to week 3 and one of 
these then failed, leaving 16 survivors past week 3. 
Note that the 17 persons in the denominator rep¬ 
resents the number in the risk set at week 3. 

Notice that the product terms in the KM formula 
for surviving past four weeks stop at the fourth 
week with the component 14/16. Similarly, the KM 
formula for surviving past eight weeks stops at the 
eighth week. 


KM formula = product limit More generally, any KM formula for a survival 

formula probability is limited to product terms up to the 

survival week being specified. That is why the KM 
formula is often referred to as a “product-limit” 
formula. 


Group 1 (treatment) 

t(j) nj mj q; S(t a) ) 

0 21 0 0 0 

6 21 3 1 1 x ^ 


Next, we consider the KM formula for the data 
from group 1, where there are several censored 
observations. 

The estimated survival probabilities obtained us¬ 
ing the KM formula are shown here for group 1. 

The first survival estimate on the list is S(0) = 1, as 
it will always be, because this gives the probability 
of surviving past time zero. 
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EXAMPLE (continued) 


Group 1 (treatment) 

% n i m i <lj s( f0) ) 


0 

21 

0 

0 

© 

6 

21 

$ 

1 

1 xfjijr .8571 

7 

17 

1 

1 

.8571 x ftl)= .8067 

10 

15 

l 

2 

.8067 x 2= =.7529 

13 

12 

l 

0 

.7529 x jd =.6902 

16 

11 

l 

3 

.6902 x ld= .6275 

22 

7 

1 

0 

.6275 x -|-=.5378 

23 

6 

1 

5 

.5378 x 4 =.4482 
o 


Fraction at fyy Pr(r > t^ I T > fyj) 

Not available at tny. failed prior to fyj 
or 

censored prior to t{ j) 
group 1 only 


KM Plots for Remission Data 


1 

0.8 

0.6 

0.4 

0.2 

0 



Group 1 (treatment) 


Group 2 (placebo) 


0 


16 


32 


Obtain KM plots from 
computer package, e.g., SAS, 
Stata, 
SPSS 


The other survival estimates are calculated by mul¬ 
tiplying the estimate for the immediately preced¬ 
ing failure time by a fraction. For example, the 
fraction is i 8/2 i for surviving past week 6, because 
2 i subjects remain up to week 6 and 3 of these 
subjects fail to survive past week 6. The fraction is 
16/17 for surviving past week 7, because 17 peo¬ 
ple remain up to week 7 and one of these fails to 
survive past week 7. The other fractions are calcu¬ 
lated similarly. 

For a specified failure time t(j), the fraction may be 
generally expressed as the conditional probability 
of surviving past time f(p, given availability (i.e., 
in the risk set) at time f( ; ). This is exactly the same 
formula that we previously used to calculate each 
product term in the product limit formula used for 
the group 2 data. 

Note that a subject might not be available at time 
t(j) for one of two reasons: (1) either the subject 
has failed prior to t(j), or (2) the subject has been 
censored prior to /(,■). Group 1 has censored ob¬ 
servations, whereas group 2 does not. Thus, for 
group 1, censored observations have to be taken 
into account when determining the number avail¬ 
able at t(j). 

Plots of the KM curves for groups f and 2 are 
shown here on the same graph. Notice that the 
KM curve for group f is consistently higher than 
the KM curve for group 2. These figures indi¬ 
cate that group i, which is the treatment group, 
has better survival prognosis than group 2, the 
placebo group. Moreover, as the number of weeks 
increases, the two curves appear to get farther 
apart, suggesting that the beneficial effects of the 
treatment over the placebo are greater the longer 
one stays in remission. 

The KM plots shown above can be easily obtained 
from most computer packages that perform sur¬ 
vival analysis, including SAS, Stata, and SPSS. All 
the user needs to do is provide a KM computer 
program with the basic data layout and then pro¬ 
vide appropriate commands to obtain plots. 
















56 2. Kaplan-Meier Survival Curves and the Log-Rank Test 


III. General Features of KM 
Curves 

General KM formula: 

= <$(f ( /_p) x Pr(J > t(j)\T > t {j) ) 


The general formula for a KM survival probabil¬ 
ity at failure time t( ; ) is shown here. This formula 
gives the probability of surviving past the previous 
failure time t(,-i), multiplied by the conditional 
probability of surviving past time ty), given sur¬ 
vival to at least time f(p. 


KM formula = product limit 
formula 


y-t ^ 

s(t(j- d) = ]!PrCr > t {i) \T > t (l} ) 

i=i 


The above KM formula can also be expressed as a 
product limit if we substitute for the survival prob¬ 
ability S(t( ; _i)), the product of all fractions that 
estimate the conditional probabilities for failure 
times f(/_p and earlier. 



For example, the probability of surviving past ten 
weeks is given in the table for group 1 (page 55) 
by .8067 times 14/15, which equals .7529. But the 
.8067 can be alternatively written as the product 
of the fractions 18/21 and 16/17. Thus, the product 
limit formula for surviving past 10 weeks is given 
by the triple product shown here. 

Similarly, the probability of surviving past sixteen 
weeks can be written either as .6902 x 10/11, or 
equivalently as the five-way product of fractions 
shown here. 


S(t(j)) = ]~[ Pr[T > t {i )\T > f (l) ] 

i = \ 

= %■-!)) 

x Pr(r > t {l) \T > t (l] ) 


The general expression for the product limit for¬ 
mula for the KM survival estimate is shown here 
together with the general KM formula given ear¬ 
lier. Both expressions are equivalent. 


Math proof: 

Pr(A and B) = Pr(A)xPr(B | A) 
always 


A simple mathematical proof of the KM formula 
can be described in probability terms. One of the 
basic rules of probability is that the probability of 
a joint event, say A and B, is equal to the prob¬ 
ability of one event, say A, times the conditional 
probability of the other event, B, given A. 
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A = “T > t(j)” -*■ A and B = B 
B = “T > t {j) " 

Pr(A and B) = Pr(B) 


If we let A be the event that a subject survives to 
at least time fq) and we let B be the event that a 
subject survives past time t(j), then the joint event 
A and B simplifies to the event B, which is inclusive 
of A. It follows that the probability of A and B 
equals the probability of surviving past time fq). 


No failures during Hi- 1) < T < hi) 
Pr(A) = Pr(T > *(,■_!)) = (SlJ^h 


Also, because t(j) is the next failure time after 
there can be no failures after time t( ; _i) and be¬ 
fore time t(j). Therefore, the probability of A is 
equivalent to the probability of surviving past the 
(/ — l)th ordered failure time. 



Furthermore, the conditional probability of B 
given A is equivalent to the conditional probability 
in the KM formula. 


Thus, from Pr(A and B) formula, Thus, using the basic rules of probability, the KM 

formula can be derived. 

Pr(A and B) = Pr(A) x Pr(B | A) 

= S(t {j _ d) 

x Pr(T > t (j) \T > t {j) ) 


IV. The Log-Rank Test for 
Two Groups 

Are KM curves statistically 
equivalent? 


We now describe how to evaluate whether or not 
KM curves for two or more groups are statistically 
equivalent. In this section we consider two groups 
only. The most popular testing method is called 
the log-rank test. 



When we state that two KM curves are “statisti¬ 
cally equivalent,” we mean that, based on a testing 
procedure that compares the two curves in some 
“overall sense,” we do not have evidence to indi¬ 
cate that the true (population) survival curves are 
different. 


.0 


8 


16 


24 
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• Chi-square test 

• Overall comparison of KM 
curves 

• Observed versus expected 
counts 

• Categories defined by ordered 
failure times 



The log-rank test is a large-sample chi-square test 
that uses as its test criterion a statistic that pro¬ 
vides an overall comparison of the KM curves be¬ 
ing compared. This (log-rank) statistic, like many 
other statistics used in other kinds of chi-square 
tests, makes use of observed versus expected cell 
counts over categories of outcomes. The cate¬ 
gories for the log-rank statistic are defined by each 
of the ordered failure times for the entire set of 
data being analyzed. 

As an example of the information required for the 
log-rank test, we again consider the comparison 
of the treatment (group 1) and placebo (group 2) 
subjects in the remission data on 42 leukemia pa¬ 
tients. 

Here, for each ordered failure time, qp, in the en¬ 
tire set of data, we show the numbers of subjects 
(mjj) failing at that time, separately by group (i), 
followed by the numbers of subjects (n !; ) in the 
risk set at that time, also separately by group. 

Thus, for example, at week 4, no subjects failed in 
group 1, whereas two subjects failed in group 2. 
Also, at week 4, the risk set for group 1 contains 
21 persons, whereas the risk set for group 2 con¬ 
tains 16 persons. 

Similarly, at week 10, one subject failed in group 1, 
and no subjects failed at group 2; the risk sets for 
each group contain 15 and 8 subjects, respectively. 


Expected cell counts: 


\« 1 ; + « 2 // 


X (m 1; - +m 2j ) 


t t 


Proportion # of failures over 

in risk set both groups 


VUj +n 2 j J 


X (mi; +m 2j ) 


We now expand the previous table to include ex¬ 
pected cell counts and observed minus expected 
values for each group at each ordered failure 
time. The formula for the expected cell counts 
is shown here for each group. For group 1, this 
formula computes the expected number at time / 
(i.e., e\ j) as the proportion of the total subjects 
in both groups who are at risk at time j, that 
is, nij/(n\j +n 2 j), multiplied by the total num¬ 
ber of failures at that time for both groups (i.e., 
m\j +m 2 j). For group 2, e 2 ; is computed simi¬ 
larly. 
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EXAMPLE 

Expanded 

i hi) 

Table (Remission Data) 

# failures # in risk set 

# expected 

Observed-expected 

mij 

m 2j 

w i; 

n 2j 

e H 

e 2j 


m 2j — e 2j 

It 

1 

0 

2 

21 

21 

(21/42) x 2 

(21/42) x 2 

-1.00 

1.00 

2 

2 

0 

2 

21 

19 

(21/40) x 2 

(19/40) x 2 

-1.05 

1.05 

3 

3 

0 

1 

21 

17 

(21/38) x 1 

(17/38) x 1 

-0.55 

0.55 

4 

4 

0 

2 

21 

16 

(21/37) x 2 

(16/37) x 2 

-1.14 

1.14 

5 

5 

0 

2 

21 

14 

(21/35) x 2 

(14/35) x 2 

-1.20 

1.20 

6 

6 

3 

0 

21 

12 

(21/33) x 3 

(12/33) x 3 

1.09 

-1.09 

7 

7 

1 

0 

17 

12 

(17/29) x 1 

(12/29) x 1 

0.41 

-0.41 

8 

8 

0 

4 

16 

12 

(16/28) x 4 

(12/28) x 4 

-2.29 

2.29 

9 

10 

1 

0 

15 

8 

(15/23) x 1 

(8/23) x 1 

0.35 

-0.35 

10 

11 

0 

2 

13 

8 

(13/21) x 2 

(8/21) x 2 

-1.24 

1.24 

11 

12 

0 

2 

12 

6 

(12/18) x 2 

(6/18) x 2 

-1.33 

1.33 

12 

13 

1 

0 

12 

4 

(12/16) x 1 

(4/16) x 1 

0.25 

-0.25 

13 

15 

0 

1 

11 

4 

(11/15) x 1 

(4/15) x 1 

-0.73 

0.73 

14 

16 

1 

0 

11 

3 

(11/14) x 1 

(3/14) x 1 

0.21 

-0.21 

15 

17 

0 

1 

10 

3 

(10/13) x 1 

(3/13) x 1 

-0.77 

0.77 

16 

22 

1 

1 

7 

2 

(7/9)x 2 

(2/9) x 2 

-0.56 

0.56 

17 

23 

1 

1 

6 

1 

(6/7)x 2 

(1/7) x 2 

-0.71 

0.71 

Totals 

9 

@ 



19.26 

(Kt74) 

-10.26 

£T02§) 


# of failure times 

17 

Of Ej = (jMij Gy), 

7 = 1 

i = 1,2 



When two groups are being compared, the log- 
rank test statistic is formed using the sum of the 
observed minus expected counts over all failure 
times for one of the two groups. In this exam¬ 
ple, this sum is —10.26 for group 1 and 10.26 for 
group 2. We will use the group 2 value to carry out 
the test, but as we can see, except for the minus 
sign, the difference is the same for the two groups. 


Two groups: 

0 2 — E 2 = summed observed For the two-group case, the log-rank statistic, 

minus expected score for group 2 shown here at the left, is computed by dividing the 

square of the summed observed minus expected 
score for one of the groups—say, group 2—by the 
variance of the summed observed minus expected 
score. 


Log-rank statistic = 


(0 2 - E 2 ) 2 
Var (0 2 - E 2 ) 
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Var(0; - Et) 

y ni/n 2 /(miy +m 2 ) )(m/ +n 2j - m xj -m 2j ) 


= £ ! 


i = l,2 


(«i; +n 2 y) 2 (m ; - +n 2; - - 1) 


The expression for the estimated variance is 
shown here. For two groups, the variance formula 
is the same for each group. This variance formula 
involves the number in the risk set in each group 
(tiij) and the number of failures in each group 
(triij) at time/. The summation is over all distinct 
failure times. 


H 0 : no difference between survival 
curves 

Log-rank statistic ~x 2 with 1 df 
under Hq 


The null hypothesis being tested is that there is no 
overall difference between the two survival curves. 
Under this null hypothesis, the log-rank statistic 
is approximately chi-square with one degree of 
freedom. Thus, a P-value for the log-rank test is 
determined from tables of the chi-square distri¬ 
bution. 


Computer programs: 

Stata’s “sts test”: 

• descriptive statistics for KM 
curves 

• log-rank statistic 

• Alternative statistics to log-rank 
statistic 


Several computer programs are available for 
calculating the log-rank statistic. For example 
the Stata package has a command called “sts 
test’’ that computes descriptive information about 
Kaplan-Meier curves, the log-rank statistic, and 
alternative statistics to the log-rank statistic, to 
be described later. Other packages, like SAS and 
SPSS, have procedures that provide results sim¬ 
ilar to those of Stata. A comparison of Stata, 
SAS, and SPSS procedures and output is pro¬ 
vided in the Computer Appendix at the back of this 
text. 


EXAMPLE 


Using Stata: Remission Data 


Group 

Events 

observed 

Events 

expected 

i 

9 

19.25 

2 

21 

10.75 

Total 

30 

30.00 

Log-rank 

= chi2(2) = 16.79 


P-value = 

Pr > chi2 = 0.000 



For the remission data, the edited printout from 
using the Stata “sts test” procedure is shown here. 
The log-rank statistic is 16.79 and the correspond¬ 
ing P-value is zero to three decimal places. This 
P-value indicates that the null hypothesis should 
be rejected. We can therefore conclude that the 
treatment and placebo groups have significantly 
different KM survival curves. 
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EXAMPLE 

II 

1 

0 

10.26 


Var(0 2 - 

e 2 ) = 

6.2685 

Log-rank statistic 

tq 

II 



Var(0 2 - E 2 ) 



- ( la26)Z -16.793 



6.2685 


Although the use of a computer is the easiest way 
to calculate the log-rank statistic, we provide here 
some of the details of the calculation. We have al¬ 
ready seen from earlier computations that the 
value of O 2 — E -2 is 10.26. The estimated variance 
of O 2 — £2 is computed from the variance formula 
above to be 6.2685. The log-rank statistic then is 
obtained by squaring 10.26 and dividing by 6.285, 
which yields 16.793, as shown on the computer 
printout. 


Approximate formula: 


# of groups 

x 2 * £ 

i 


(O, - Ei ) 2 
Ei 


An approximation to the log-rank statistic, shown 
here, can be calculated using observed and ex¬ 
pected values for each group without having to 
compute the variance formula. The approximate 
formula is of the classic chi-square form that sums 
over each group being compared the square of the 
observed minus expected value divided by the ex¬ 
pected value. 



The calculation of the approximate formula is 
shown here for the remission data. The expected 
values are 19.26 and 10.74 for groups 1 and 
2, respectively. The chi-square value obtained is 
15.276, which is slightly smaller than the log-rank 
statistic of 16.793. 


V. The Log-Rank Test for 
Several Groups 

H 0 : All survival curves are the 
same. 


The log-rank test can also be used to compare 
three or more survival curves. The null hypothesis 
for this more general situation is that all survival 
curves are the same. 


Log-rank statistics for > 2 groups 
involves variances and covariances 
of Oj - Ei. 

Matrix formula: See Appendix at 
end of this chapter. 


Although the same tabular layout can be used to 
carry out the calculations when there are more 
than two groups, the test statistic is more com¬ 
plicated mathematically, involving both variances 
and covariances of summed observed minus ex¬ 
pected scores for each group. A convenient math¬ 
ematical formula can be given in matrix terms. 
We present the matrix formula for the inter¬ 
ested reader in an Appendix at the end of this 
chapter. 
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Use computer program for 
calculations. 


G (> 2) groups: 
log-rank statistic 


G-ldf 


~X 2 with 


Approximation formula: 


X 2 


# of groups 

E 


( Oi-Ej ) 2 

Ei 


Not required because computer 
program calculates the exact 
log-rank statistic 


We will not describe further details about the cal¬ 
culation of the log-rank statistic, because a com¬ 
puter program can easily carry out the computa¬ 
tions from the basic data file. Instead, we illustrate 
the use of this test with data involving more than 
two groups. 

If the number of groups being compared is 
G(> 2), then the log-rank statistic has approxi¬ 
mately a large sample chi-square distribution with 
G — 1 degrees of freedom. Therefore, the decision 
about significance is made using chi-square tables 
with the appropriate degrees of freedom. 

The approximate formula previously described in¬ 
volving only observed and expected values with¬ 
out variance or covariance calculations can also 
be used when there are more than two groups be¬ 
ing compared. However, practically speaking, the 
use of this approximate formula is not required as 
long as a computer program is available to calcu¬ 
late the exact log-rank statistic. 


We now provide an example to illustrate the use of 
the log-rank statistic to compare more than two 
groups. 


EXAMPLE 


vets.dat: survival time in days, 
n = 137 

Veteran's Administration Lung Cancer Trial 
Column 1 : Treatment (standard = 1 , test = 2) 
Column 2: Cell type 1 (large = 1, other = 0) 
Column 3: Cell type 2 (adeno = 1, other = 0) 
Column 4: Cell type 3 (small = 1, other = 0) 
Column 5: Cell type 4 (squamous = 1, other = 0) 
Column 6: Survival time (days) 

Column 7: ( ^Performance Status) 

(0 = worst... 100 = best) 

Column 8: Disease duration (months) 

Column 9: Age 

Column 10: Prior therapy (none = 0, some =1) 
Column 11: Status (0 = censored, 1 = died) 


The data set “vets.dat” considers survival times in 
days for 137 patients from the Veteran’s Admin¬ 
istration Lung Cancer Trial cited by Kalbfleisch 
and Prentice in their text (The Statistical Analysis 
of Survival Time Data, John Wiley, pp. 223-224, 
1980). A complete list of the variables is shown 
here. Failure status is defined by the status vari¬ 
able (column 11). 

Among the variables listed, we now focus on the 
performance status variable (column 7). This vari¬ 
able is an continuous variable, so before we can 
obtain KM curves and the log-rank test, we need 
to categorize this variable. 
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EXAMPLE (continued) 


Performance Status Categories 


Group # 

Categories 

Size 

1 

0-59 

52 

2 

60-74 

50 

3 

75-100 

35 


KM curves for performance status groups 


1.0 ff 



0 100 200 300 400 500 600 

Events Events 
Group observed expected 

~ 50 26.30 

2 47 55.17 

3 31 46.53 

Total 128 128.00 

Log-rank = chi2(2) = 29.18 
P-value = Pr > chi2 = 0.0000 
G = 3 groups; df = G - 1 = 2 

Log-rank test is highly significant. 

Conclude significant difference among 
three survival curves. 


If, for the performance status variable, we choose 
the categories 0-59, 60-74, and 75-100, we obtain 
three groups of sizes 52, 50, and 35, respectively. 


The KM curves for each of three groups are shown 
here. Notice that these curves appear to be quite 
different. A test of significance of this difference is 
provided by the log-rank statistic. 


An edited printout of descriptive information 
about the three KM curves together with the log- 
rank test results are shown here. These results 
were obtained using the Stata package. 


Because three groups are being compared here, 
G = 3 and the degrees of freedom for the log- 
rank test is thus G - 1, or 2. The log-rank statistic 
is computed to be 29.181, which has a P-value of 
zero to three decimal places. Thus, the conclusion 
from the log-rank test is that there is a highly sig¬ 
nificant difference among the three survival curves 
for the performance status groups. 


VI. Alternatives to the Log 
Rank Test 

Alternative tests supported by Stata 

Wilcoxen 

Tarone-Ware 

Peto 

Flemington-Harrington 


There are several alternatives to the log rank test 
offered by Stata, SAS, and SPSS designed to test 
the hypothesis that two or more survival curves 
are equivalent. In this section we describe the 

Wilcoxon, the Tarone-Ware, the Peto, and the 
Flemington-Harrington test. All of these tests 
are variations of the log rank test and are easily 
implemented in Stata. 
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Log rank uses 

Oi - Ei = J2( m ij ~ e ij ) 

7 

i = group # 
j = jth failure time 


In describing the differences among these tests, 
recall that the log rank test uses the summed ob¬ 
served minus expected score 0 — Em each group 
to form the test statistic. This simple sum gives 
the same weight—namely, unity—to each failure 
time when combining observed minus expected 
failures in each group. 


Weighting the test statistic for two 
groups 


Test statistic: 


XXt/Xm;,- 



The Wilcoxon, Tarone-Ware, Peto, and 
Flemington-Harrington test statistics are 
variations of the log rank test statistic and are 
derived by applying different weights at the jth 
failure time (as shown on the left for two groups). 


var XXb)( m u 


i= 1,2 



j = jth failure time 

w(tj) = weight at jth failure time 


Wilcoxon Test 

• w(tj) = rij (number at risk) 

• Earlier failures receive more 
weight 

• Appropriate if treatment effect 
is strongest in earliest phases 
of administration 


The Wilcoxon test (called the Breslow test in 
SPSS) weights the observed minus expected score 
at time t, by the number at risk n ; , over all groups 
at time tj. Thus, the Wilcoxon test places more em¬ 
phasis on the information at the beginning of the 
survival curve where the number at risk is large al¬ 
lowing early failures to receive more weight than 
later failures. This type of weighting may be used 
to assess whether the effect of a treatment on sur¬ 
vival is strongest in the earlier phases of adminis¬ 
tration and tends to be less effective over time. 


Weights Used for Various Test 
Statistics 


Test Statistic 

W(tj) 

Log rank 

1 

Wilcoxon 

«; 

Tarone-Ware 

V«7- 

Peto 

s(tj) 

Flemington- 

§{tj-i)p[l -s(tj -iXF 

Harrington 



The Tarone-Ware test statistic also applies more 
weight to the early failure times by weighting 
the observed minus expected score at time t, by 
the square root of the number at risk s /nJ. The 
Peto test weights the jth failure time by the sur¬ 
vival estimate s(tj ) calculated over all groups com¬ 
bined. This survival estimate s(tj) is similar but 
not exactly equal to the Kaplan-Meier survival 
estimate. The Flemington-Harrington test uses 
the Kaplan-Meier survival estimate s(t) over all 
groups to calculate its weights for the jth failure 
time, — s(tj- 1 )] ? . The weights for each 

of these test statistics are summarized on the left. 
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Flemington-Harrington Test 

if p = 1 and q = 0, w(t) = s(t ; _i) 
if p = 0 and q = 1, 
w(t) = i -§(tj- 1 ) 
if p = 0 and q = 0, 
w(t) = 1 (log rank test) 


Comparisons of Test Results: 
Remission Data, Testing 
Treatment (RX) 


Test 

Chi-square 

(ldf) 

P-value 

Log rank 

16.79 

0.0000 

Wilcoxon 

13.46 

0.0002 

Tarone- 

15.12 

0.0001 

Ware 

Peto 

14.08 

0.0002 

FH (p = 3, 

8.99 

0.0027 

q = 1) 

FH (p=l, 

12.26 

0.0005 

q = 3) 


Vets Data, 3-Level Performance 
Status 


Test 

Chi-square 
(2 df) 

P-value 

Log rank 

29.18 

0.0000 

Wilcoxon 

46.10 

0.0000 

Remission Data, 2-Level Treatment 


Chi-square 


Test 

(ldf) 

P-value 

Log rank 

16.79 

0.0000 

Wilcoxon 

13.46 

0.0002 


The Flemington-Harrington test allows the most 
flexibility in terms of the choice of weights because 
the user provides the values of p and q. For exam¬ 
ple, if p = 1 and q = 0 then w(t) = s(f/_i) which 
gives more weight for the earlier survival times 
when is close to one. However, if p = 0 and 
q = 1 then w(t) = 1 —s(tj- 1 ) in which case the 
later survival times receive more weight. If p = 0 
and q = 0 then w(t) = 1, and the Flemington- 
Harrington test reduces to the log rank test. 

On the left is a comparison of test results for the 
effect of treatment (vs. placebo) using the remis¬ 
sion data. The log rank chi-square statistic (also 
displayed previously in this chapter) is the high¬ 
est among these tests at 16.79. The Flemington- 
Harrington (FH) test with p = 3 and q = 1 yielded 
the lowest chi-square value at 8.99, although with 
this weighting it is not immediately obvious which 
part of the survival curve is getting the most 
weight. However, all the test results are highly sig¬ 
nificant yielding a similar conclusion to reject the 
null hypothesis. 


On the left are comparisons of the log rank and 
Wilcoxon tests for the 3-level performance status 
variable from the vets dataset discussed in the pre¬ 
vious section. The Wilcoxon test yields a higher 
chi-square value (46.10) than the log rank test 
(29.18). In contrast, the log rank test for the ef¬ 
fect of treatment (RX) from the remissions data 
yields a higher chi-square value (16.79) than the 
Wilcoxon test (13.46). However, both the Wilcoxon 
and log rank tests are highly significant for both 
performance status and for treatment. 
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KM curves for performance status groups 

1.0 

0.5 


0.0 


0 100 200 300 400 500 600 



A comparison of survival curves gives insight into 
why the Wilcoxon test yields a higher chi-square 
value than the log rank test for the 3-level perfor¬ 
mance status variable. The 3 curves being com¬ 
pared are farthest apart in the early part of follow¬ 
up before becoming closer later. By contrast, a 
comparison of the 2 curves for treatment shows 
the curves diverging over time. 


KM Plots for Remission Data 



Choosing a Test 

• Results of different weightings 
usually lead to similar 
conclusions 

• The best choice is test with 
most power 

• Power depends on how the 
null is violated 

• There may be a clinical reason 
to choose a particular weighting 

• Choice of weighting should be 
a priori 


In general, the various weightings should provide 
similar results and will usually lead to the same 
decision as to whether the null hypothesis is re¬ 
jected. The choice of which weighting of the test 
statistic to use (e.g., log rank or Wilcoxon) depends 
on which test is believed to provide the greatest 
statistical power, which in turn depends on how it 
is believed the null hypothesis is violated. 

If there is a clinical reason to believe the effect of 
an exposure is more pronounced toward the be¬ 
ginning (or end) of the survival function, then it 
makes sense to use a weighted test statistic. How¬ 
ever, one should make an a priori decision on 
which statistical test to use rather than fish for 
a desired p-value. Fishing for a desired result may 
lead to bias. 
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Stratified log rank test 

• 0 — E scores calculated within 
strata 

• 0 — E scores then summed 
across strata 

• Allows control of stratified 
variable 


Stratified log-rank test 
->lwbc3 = 1 



1 

Events 

Events 

rx 

1 

observed 

expected 

0 

1 

0 

2.91 

1 

1 

4 

1.09 

Total 

1 

4 

4.00 

->lwbc3 


2 



1 

Events 

Events 

rx 

1 

observed 

expected 

0 

1 

5 

7.36 

1 

1 

5 

2.64 

Total 

1 

10 

10.00 

->lwbc3 


3 



1 

Events 

Events 

rx 

1 

observed 

expected 

0 

1 

4 

6.11 

1 

1 

12 

9.89 

Total 

1 

16 

16.00 

-> Total 




1 


Events 


1 

Events 

expected 

rx 

1 

observed 

(*) 

0 

1 

9 

16.38 

1 

1 

21 

13.62 

Total 

1 

30 

30.00 


(*) sum over calculations 
within lwbc3 chi2 (1) = 
10.14, Pr > chi2 = 0.0014 


The stratified log rank test is another variation 
of the log rank test. With this test the summed 
observed minus expected scores 0 - E are cal¬ 
culated within strata of each group and then 
summed across strata. The stratified log rank 
test provides a method of testing the equiv¬ 
alence of survival curves controlling for the 
stratified variable. An example of the stratified 
log rank test is presented next using the remission 
data. 

On the left is Stata output from performing a strat¬ 
ified log rank test for the effect of treatment (RX) 
stratified by a 3-level variable (LWBC3) indicating 
low, medium, or high log white blood cell count 
(coded 1, 2, and 3, respectively). 


Within each stratum of LWBC3, the expected num¬ 
ber of events is calculated for the treated group 
(RX = 0) and for the placebo group (RX =1). The 
total expected number of events for the treated 
group is found by summing the expected num¬ 
ber of events over the three strata: 2.91 + 7.36 + 
6.11 = 16.38. Similarly the total expected num¬ 
ber of events for the placebo group is calculated: 
1.09 + 2.64 + 9.89 = 13.62. This compares to 9 
observed cases from the treated group and 21 ob¬ 
served cases from the placebo group yielding a chi- 
square value of 10.14 with 1 degree of freedom (for 
2 levels of treatment) and a corresponding p-value 
of 0.0014. 


Recall that when we did not control for log white 
blood cell count, the log rank test for the effect of 
treatment yielded a chi-square value of 16.79 and 
a corresponding p-value rounded to 0.0000. 












68 2. Kaplan-Meier Survival Curves and the Log-Rank Test 


Log rank unstratified 

Oi E{ = y ' Ui'ijj &ij) 
i 

i = group #, j = jth failure time 
Log rank stratified 

Oi Ei = ^ ' y ', {^ijs G/s) 

s i 

i = group #, j = jth failure time, 

s = stratum # 


The only difference between the unstratified and 
stratified approaches is that for the unstratified 
approach, the observed minus expected number 
of events for each failure time is summed over all 
failure times for each group (i). With the stratified 
approach, the observed minus expected number 
of events is summed over all failure times for each 
group within each stratum and then summed over 
all strata. Either way, the null distribution is chi- 
square with G — 1 degrees of freedom, where G 
represents the number of groups being compared 
(not the number of strata). 


Stratified or unstratified (G groups) 
Under Ho: 

log rank statistic ~/ 2 with 
G-ldf 


Can stratify with other tests 
Wilcoxon, Tarone-Ware, 

Peto, Flemington-Harrington 

Limitation 

Sample-size may be small within 
strata 


The stratified approach can also be applied to any 
of the weighted variations of the log rank test (e.g., 
Wilcoxon). A limitation of the stratified approach 
is the reduced sample size within each stratum. 
This is particularly problematic with the remis¬ 
sion dataset, which has a small sample size to be¬ 
gin with. 


Alternatively 

Test associations using modeling 

• Can simultaneously control 
covariates 

• Shown in next chapter 


We have shown how the stratified log rank test 
can be used to test the effect of treatment while 
controlling for log white blood cell count. In the 
next chapter we show how modeling can be used 
to test an association of a predictor variable while 
simultaneously controlling for other covariates. 


VII. Summary 

KM curves: 



We now briefly summarize this presentation. First, 
we described how to estimate and graph survival 
curves using the Kaplan-Meier (KM) method. 


0 


8 


16 


32 
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t(j ): j th ordered failure time 

S{t(j)) = ]~[ Pr[r > t(i)\T > t {i) ] 
1=1 


= %■-!)) 

xPr(r > t (i) ir > f(o) 


To compute KM curves, we must form a data lay¬ 
out that orders the failure times from smallest to 
largest. For each ordered failure time, the esti¬ 
mated survival probability is computed using the 
product limit formula shown here. Alternatively, 
this estimate can be computed as the product of 
the survival estimate for the previous failure time 
multiplied by the conditional probability of sur¬ 
viving past the current failure time. 


Log-rank test: 

common survival curve for 
all groups 

T , . . (0 2 -E 2 ) 2 

Log-rank statistic = ——-— 

Var(02 — E 2 ) 

log-rank statistic ~x 2 wi th G — 1 
df under EIq 

G = # of groups 


When survival curves are being compared, the log- 
rank test gives a statistical test of the null hypoth¬ 
esis of a common survival curve. For two groups, 
the log-rank statistic is based on the summed ob¬ 
served minus expected scores for a given group 
and its variance estimate. For several groups, a 
computer should always be used since the log- 
rank formula is more complicated mathemati¬ 
cally. The test statistic is approximately chi-square 
in large samples with G — 1 degrees of freedom, 
where G denotes the number of groups being com¬ 
pared. 


Chapters This presentation is now complete. You can review 

this presentation using the detailed outline that 
follows and then try the practice exercises and test. 


1. Introduction 

/ 2/Kaplan-Meier Survival Curves 
land the Log-Rank Test 


Next: 

3. The Cox Proportional Hazards 
Model and Its Characteristics 


Chapter 3 introduces the Cox proportional haz¬ 
ards (PH) model, which is the most popular math¬ 
ematical modeling approach for estimating sur¬ 
vival curves when considering several explanatory 
variables simultaneously. 
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Detailed 

Outline 


I. Review (pages 48-50) 

A. The outcome variable is (survival) time until an 
event (failure) occurs. 

B. Key problem: censored data, i.e., don't know 
survival time exactly. 

C. Notation: T = survival time random variable 

t = specific value of T 
6 = (0, 1) variable for failure/ 
censorship status 
Sit) = survivor function 
hit) = hazard function 

D. Properties of survivor function: 

i. theoretically, graph is smooth curve, decreasing 
from S(t) = 1 at time t = 0 to S(t) = 0 at 

t = oo; 

ii. in practice, graph is step function. 

E. Properties of hit): 

i. instantaneous potential for failing given 
survival up to time; 

ii. hit) is a rate; ranges from 0 to oo. 

F. Relationship of S(t) to hit)', if you know one you 
can determine the other. 

G. Goals of survival analysis: estimation of survivor 
and hazard functions; comparisons and 
relationships of explanatory variables to survival. 

H. Data layouts 

i. for the computer; 

ii. for understanding the analysis: involves risk 
sets. 

II. An Example of Kaplan-Meier Curves (pages 51-55) 

A. Data are from study of remission times in weeks for 
two groups of leukemia patients (21 in each group). 

B. Group 1 (treatment group) has several censored 
observations, whereas group 2 has no censored 
observations. 

C. Table of ordered failure times is provided for each 
group. 

D. For group 2 (all noncensored), survival probabilities 
are estimated directly and plotted. Formula used is 

a u x # surviving past t (j) 
s V(iV = 


21 
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E. Alternative approach for group 2 is given by a 

product limit formula. 

F. For group 1, survival probabilities calculated by 
multiplying estimate for immediately preceding 
failure time by a conditional probability of 
surviving past current failure time, i.e., 


4) = 4-1) W >hi)\ T >*(;)]■ 


III. General Features of KM Curves (pages 56-57) 

A. Two alternative general formulae: 


4) = n Pr[T > t({)\T > f(,)] (product limit 


formula) 


;=i 


4) = S (;-l) Pr [ r > hi)\ T ^ b/)] 

B. Second formula derived from probability rule: 
Pr(A and B) = Pr(A) x Pr(B|A) 

IV. The Log-Rank Test for Two Groups (pages 57-61) 

A. Large sample chi-square test; provides overall 
comparison of KM curves. 

B. Uses observed versus expected counts over 
categories of outcomes, where categories are 
defined by ordered failure times for entire set of 
data. 

C. Example provided using remission data involving 
two groups: 

i. expanded table described to show how 
expected and observed minus expected cell 
counts are computed. 

ii. for zth group at time j, where i = 1 or 2: 

observed counts = m !; , 
expected counts = e; ; , where 
expected counts = (proportion in risk set) x 
(# failures over both groups), 



D. Log-rank statistic for two groups: 


(0,- -E,) 2 
Var(0, -Ei)’ 


where i = 1,2, 
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Oi - Ei = rriij - eij), and 


Var(0; - Ei) 

_ n 1 j n 2 j (mi+ m 2 / )(ni+ n 2 j -m Xj - m 2 y) 

j {n\j+n 2 j) 2 (n\j+n 2 j-\) 

i = 1,2 

E. T/o: no difference between survival curves. 

F. Log-rank statistic ~x 2 with 1 df under EI 0 . 

G. Approximate formula: 


A 2 = ^ ^ ——, where G = 2 = # of groups 

i=i 


Ei 


H. Remission data example: Log-rank statistic = 
16.793, whereas A 2 = 15.276. 

V. The Log-Rank Test for Several Groups 

(pages 61-63) 

A. Involves variances and covariances; matrix 
formula in Appendix. 

B. Use computer for calculations. 

C. Under H 0 , log-rank statistic ~x 2 with G — 1 df, 
where G = # of groups. 

D. Example provided using vets.dat with interval 
variable “performance status”; this variable is 
categorized into G = 3 groups, so df for log-rank 
test is G — 1 = 2, log-rank statistic is 29.181 

(P = 0.0). 

VI. Alternatives to the Log-rank Test (pages 63-68) 

A. Alternative tests supported by Stata: 

Wilcoxen, Tarone-Ware, Peto, and 
Flemington-Harrington. 

B. Alternative tests differ by applying different 
weights at the j-th failure time. 

C. The choice of alternative depends on the reason 
for the belief that the effect is more pronounced 
towards the beginning (or end) of the survival 
function. 

D. The stratified-log-rank test is a variation of the 
log-rank test that controls for one or more 
stratified variables. 

VII. Summary (page 68) 
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Practice 

Exercises 


1. The following data are a sample from the 1967-1980 Evans 
County study. Survival times (in years) are given for two study 
groups, each with 25 participants. Group 1 has no history of 
chronic disease (CHR = 0), and group 2 has a positive history 
of chronic disease (CHR = 1): 

Group 1 (CHR = 0): 12.3+, 5.4, 8.2, 12.2+, 11.7, 10.0, 5.7, 

9.8, 2.6, 11.0,9.2, 12.1+, 6.6, 2.2, 

1.8, 10.2, 10.7, 11.1,5.3,3.5,9.2, 

2.5, 8.7, 3.8, 3.0 

Group 2 (CHR = 1): 5.8, 2.9, 8.4, 8.3, 9.1,4.2, 4.1, 1.8, 3.1, 
11.4,2.4, 1.4, 5.9, 1.6, 2.8, 4.9, 3.5, 

6.5, 9.9, 3.6, 5.2, 8.8, 7.8, 4.7, 3.9 

a. Fill in the missing information in the following table 
of ordered failure times for groups 1 and 2: 


Group 1 Group 2 


hi) 

n i 

rrij 

9/ 

$(*(/)) 

hi) 

n i 

nij 

9/ 

$(*(/)) 

0.0 

25 

0 

0 

1.00 

0.0 

25 

0 

0 

1.00 

1.8 

25 

1 

0 

.96 

1.4 

25 

1 

0 

.96 

2.2 

24 

1 

0 

.92 

1.6 

24 

1 

0 

.92 

2.5 

23 

1 

0 

.88 

1.8 

23 

1 

0 

.88 


2.6 

22 

1 

0 

.84 

2.4 

22 

1 

0 

.84 

3.0 

21 

1 

0 

.80 

2.8 

21 

1 

0 

.80 

3.5 

20 

Cl 


') 

2.9 

20 

1 

0 

.76 

3.8 

19 

1 

0 

.72 

3.1 

19 

1 

0 

.72 

5.3 

18 

1 

0 

.68 

3.5 

18 

1 

0 

.68 

5.4 

17 

1 

0 

.64 

3.6 

17 

1 

0 

.64 

5.7 

16 

1 

0 

.60 

3.9 

c - 





6.6 

15 

1 

0 

.56 

4.1 






8.2 

14 

1 

0 

.52 

4.2 

V_ 



> 


8.7 

13 

1 

0 

.48 

4.7 

13 

1 

0 

.48 

9.2 

l 



1 

4.9 

12 

1 

0 

.44 

9.8 

10 

1 

0 

.36 

5.2 

11 

1 

0 

.40 

10.0 

9 

1 

0 

.32 

5.8 

10 

1 

0 

.36 

10.2 

8 

1 

0 

.28 

5.9 

9 

1 

0 

.32 

10.7 

7 

1 

0 

.24 

6.5 

8 

1 

0 

.28 

11.0 

6 

1 

0 

.20 

7.8 

7 

1 

0 

.24 

11.1 

5 

1 

0 

.16 

8.3 

6 

1 

0 

.20 

11.7 

4 

Cl 


_) 

8.4 

5 

1 

0 

.16 






8.8 

4 

1 

0 

.12 






9.1 

r 










9.9 

L 










11.4 

i 

1 

0 

.00 
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b. Based on your results in part a, plot the KM curves 
for groups 1 and 2 on the same graph. Comment on 
how these curves compare with each other. 

c. Fill in the following expanded table of ordered failure 
times to allow for the computation of expected and 
observed minus expected values at each ordered 
failure time. Note that your new table here should 
combine both groups of ordered failure times into 
one listing and should have the following format: 



('Continued on next page) 
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d. Use the results in part c to compute the log-rank 
statistic. Use this statistic to carry out the log-rank 
test for these data. What is your null hypothesis and 
how is the test statistic distributed under this null 
hypothesis? What are your conclusions from the 
test? 

2. The following data set called “anderson.dat” consists of re¬ 
mission survival times on 42 leukemia patients, half of whom 
get a certain new treatment therapy and the other half of 
whom get a standard treatment therapy. The exposure vari¬ 
able of interest is treatment status (Rx — 0 if new treatment, 
Rx = 1 if standard treatment). Two other variables for con¬ 
trol as potential confounders are log white blood cell count 
(i.e., logwbc) and sex. Failure status is defined by the relapse 
variable (0 if censored, 1 if failure). The data set is listed as 
follows: 


Subj 

Survt 

Relapse 

Sex 

log WBC 

Rx 

1 

35 

0 

1 

1.45 

0 

2 

34 

0 

1 

1.47 

0 

3 

32 

0 

1 

2.20 

0 

4 

32 

0 

1 

2.53 

0 

5 

25 

0 

1 

1.78 

0 

6 

23 

1 

1 

2.57 

0 

7 

22 

1 

1 

2.32 

0 

8 

20 

0 

1 

2.01 

0 

9 

19 

0 

0 

2.05 

0 

10 

17 

0 

0 

2.16 

0 




(■Continued on next page) 
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Subj 

Survt 

Relapse 

Sex 

log WBC 

Rx 

11 

16 

1 

1 

3.60 

0 

12 

13 

1 

0 

2.88 

0 

13 

11 

0 

0 

2.60 

0 

14 

10 

0 

0 

2.70 

0 

15 

10 

1 

0 

2.96 

0 

16 

9 

0 

0 

2.80 

0 

17 

7 

1 

0 

4.43 

0 

18 

6 

0 

0 

3.20 

0 

19 

6 

1 

0 

2.31 

0 

20 

6 

1 

1 

4.06 

0 

21 

6 

1 

0 

3.28 

0 

22 

23 

1 

1 

1.97 

1 

23 

22 

1 

0 

2.73 

1 

24 

17 

1 

0 

2.95 

1 

25 

15 

1 

0 

2.30 

1 

26 

12 

1 

0 

1.50 

1 

27 

12 

1 

0 

3.06 

1 

28 

11 

1 

0 

3.49 

1 

29 

11 

1 

0 

2.12 

1 

30 

8 

1 

0 

3.52 

1 

31 

8 

1 

0 

3.05 

1 

32 

8 

1 

0 

2.32 

1 

33 

8 

1 

1 

3.26 

1 

34 

5 

1 

1 

3.49 

1 

35 

5 

1 

0 

3.97 

1 

36 

4 

1 

1 

4.36 

1 

37 

4 

1 

1 

2.42 

1 

38 

3 

1 

1 

4.01 

1 

39 

2 

1 

1 

4.91 

1 

40 

2 

1 

1 

4.48 

1 

41 

1 

1 

1 

2.80 

1 

42 

1 

1 

1 

5.00 

1 


a. Suppose we wish to describe KM curves for the 
variable logwbc. Because logwbc is continuous, we 
need to categorize this variable before we compute 
KM curves. Suppose we categorize logwbc into 
three categories—low, medium, and high—as 
follows: 


low (0-2.30), n = 11; 
medium (2.31-3.00), n = 14; 
high (>3.00), n = 17. 
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Test 


Based on this categorization, compute and graph 
KM curves for each of the three categories of logwbc. 
(You may use a computer program to assist you or 
you can form three tables of ordered failure times 
and compute KM probabilities directly.) 

b. Compare the three KM plots you obtained in part a. 
How are they different? 

c. Below is an edited printout of the log-rank test 
comparing the three groups. 

Events Events 
Group observed expected 

1 4 13.06 

2 10 10.72 

3 16_6.21 

Total 30 30.00 

Log-rank = chi2(2) = 26.39 
P-value = Pr > chi2 = 0.0000 

What do you conclude about whether or not the 
three survival curves are the same? 


To answer the questions below, you will need to use a com¬ 
puter program (from SAS, Stata, SPSS, or any other package 
you are familiar with) that computes and plots KM curves and 
computes the log-rank test. Freely downloadable files can be 
obtained from weblink http://www.sph.emory.edu/~dkleinb/ 
surv2.htm . 

1. For the vets.dat data set described in the presentation: 

a. Obtain KM plots for the two categories of the 
variable cell type 1(1= large, 0 = other). Comment 
on how the two curves compare with each other. 

Carry out the log-rank, and draw conclusions from 
the test(s). 

b. Obtain KM plots for the four categories of cell 
type—large, adeno, small, and squamous. Note that 
you will need to recode the data to define a single 
variable which numerically distinguishes the four 
categories (e.g., 1 = large, 2 = adeno, etc.). As in part 
a, compare the four KM curves. Also, carry out the 
log-rank for the equality of the four curves and draw 
conclusions. 
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2. The following questions consider a data set from a study 
by Caplehorn et al. (“Methadone Dosage and Retention of 
Patients in Maintenance Treatment,” Med. J. Aust., 1991). 
These data comprise the times in days spent by heroin ad¬ 
dicts from entry to departure from one of two methadone 
clinics. There are two further covariates, namely, prison 
record and methadone dose, believed to affect the sur¬ 
vival times. The data set name is addicts.dat. A listing of 
the variables is given below: 


Column 1: 
Column 2: 
Column 3: 

Column 4: 
Column 5: 
Column 6: 


Subject ID 
Clinic (1 or 2) 

Survival status (0 = censored, 1 = departed 
from clinic) 

Survival time in days 

Prison record (0 = none, 1 = any) 

Methadone dose (mg/day) 


a. Compute and plot the KM plots for the two categories 
of the “clinic” variable and comment on the extent to 
which they differ. 

b. A printout of the log-rank and Wilcoxon tests (using 
Stata) is provided below. What are your conclusions 
from this printout? 


Events Events 
Group observed expected 

1 122 90.91 

2 28 59.09 

Total 150 150.00 

Log-rank = chi2(2) = 27.89 
P-value = Pr > chi2 = 0.0000 
Wilcoxon = chi2(2) = 11.63 
P-value = Pr > chi2 = 0.0007 


c. Compute and evaluate KM curves and the log-rank 
test for comparing suitably chosen categories of the 
variable “Methadone dose.” Explain how you 
determined the categories for this variable. 
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Answers to 

Practice 

Exercises 


1. a. 


Group 1 Group 2 


Hi) 

n i 

m.j 

Vi 

«(*(;)) 

Hi) 

n i 

m.j 

Qj 

Sikn) 

0.0 

25 

0 

0 

1.00 

0.0 

25 

0 

0 

1.00 

1.8 

25 

1 

0 

.96 

1.4 

25 

1 

0 

.96 

2.2 

24 

1 

0 

.92 

1.6 

24 

1 

0 

.92 

2.5 

23 

1 

0 

.88 

1.8 

23 

1 

0 

.88 

2.6 

22 

1 

0 

.84 

2.4 

22 

1 

0 

.84 

3.0 

21 

1 

0 

.80 

2.8 

21 

1 

0 

.80 

3.5 

20 

1 

0 

.76 | 

2.9 

20 

1 

0 

.76 

3.8 

19 

1 

0 

.72 

3.1 

19 

1 

0 

.72 

5.3 

18 

1 

0 

.68 

3.5 

18 

1 

0 

.68 

5.4 

17 

1 

0 

.64 

3.6 

17 

1 

0 

.64 

5.7 

16 

1 

0 

.60 

3.9 

fl6 

1 

0 

.60 h 

6.6 

15 

1 

0 

.56 

4.1 

15 

1 

0 

.56 

8.2 

14 

1 

0 

.52 

4.2 

U4 

1 

0 

■52 J 

8.7 

13 

1 

0 

.48 

4.7 

13 

1 

0 

.48 

9.2 

12 

2 

0 

.40] 

4.9 

12 

1 

0 

.44 

9.8 

10 

1 

0 

.36 

5.2 

11 

1 

0 

.40 

10.0 

9 

1 

0 

.32 

5.8 

10 

1 

0 

.36 

10.2 

8 

1 

0 

.28 

5.9 

9 

1 

0 

.32 

10.7 

7 

1 

0 

.24 

6.5 

8 

1 

0 

.28 

11.0 

6 

1 

0 

.20 

7.8 

7 

1 

0 

.24 

11.1 

5 

1 

0 

.16 

8.3 

6 

1 

0 

.20 

11.7 

4 

1 

3 

.12] 

8.4 

5 

1 

0 

.16 






8.8 

4 

1 

0 

.12 






9.1 

3 

1 

0 

.08 






9.9 

2 

1 

0 

.04 






11.4 

1 

1 

0 

.00 


b. KM curves for CHR data: 



Group 1 appears to have consistently better 
survival prognosis than group 2. However, the KM 
curves are very close during the first four years, but 
are quite separate after four years, although they 
appear to come close again around twelve years. 
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c. Using the expanded table format, the following 
information is obtained: 


hi) 

mij 

m 2j 

Ml/ 

n 2j 

eij 

e V 

mij -eyj 

m 2j - e 2 j 

1.4 

0 

1 

25 

25 

.500 

.500 

-.500 

.500 

1.6 

0 

1 

25 

24 

.510 

.490 

-.510 

.510 

1.8 

1 

1 

25 

23 

1.042 

.958 

-.042 

.042 

2.2 

1 

0 

24 

22 

.522 

.478 

.478 

-.478 

2.4. 

0 

1 

23 

22 

.511 

.489 

-.511 

.511 

2.5. 

1 

0 

23 

21 

.523 

.477 

.477 

-.477 

2.6 

1 

0 

22 

21 

.516 

.484 

.484 

-.484 

2.8 

0 

1 

21 

21 

.500 

.500 

-.500 

.500 

2.9 

0 

1 

21 

20 

.512 

.488 

-.512 

.512 

3.0 

1 

0 

21 

19 

.525 

.475 

.475 

-.475 

3.1 



1 

20 

19 

.513 

.487 

-.513 

.513\ 

3.5 


1 

1 

20 

18 

1.053 

.947 

-.053 

.053 

3.6 


0 

1 

19 

17 

.528 

.472 

-.528 

.528 

3.8 



0 

19 

16 

.543 

.457 

.457 

-.4577 

3.9 

0 

1 

18 

16 

.529 

.471 

-.529 

.529 

4.1 

0 

1 

18 

15 

.545 

.455 

-.545 

.545 

4.2 

0 

1 

18 

14 

.563 

.437 

-.563 

.563 

4.7 

0 

1 

18 

13 

.581 

.419 

-.581 

.581 

4.9 

0 

1 

18 

12 

.600 

.400 

-.600 

.600 

5.2 

0 

1 

18 

11 

.621 

.379 

-.621 

.621 

5.3 

1 

0 

18 

10 

.643 

.357 

.357 

-.357 

5.4 

1 

0 

17 

10 

.630 

.370 

.370 

-.370 

5.7 

1 

0 

16 

10 

.615 

.385 

.385 

-.385 

5.8 

0 

1 

15 

10 

.600 

.400 

-.600 

.600 

5.9 

0 

1 

15 

9 

.625 

.375 

-.625 

.625 

6.5 

0 

1 

15 

8 

.652 

.348 

-.652 

.652 

6.6 

1 

0 

15 

7 

.682 

.318 

.318 

-.318 

7.8 

0 

1 

14 

7 

.667 

.333 

-.667 

.667 

8.2 

1 

0 

14 

6 

.700 

.300 

.300 

-.300 

8.3 

0 

1 

13 

6 

.684 

.316 

-.684 

.684 

8.4 

0 

1 

13 

5 

.722 

.278 

-.722 

.722 

8.7 

1 

0 

13 

4 

.765 

.235 

.335 

-.335 

oo 

oo 

0 

1 

12 

4 

.750 

.250 

-.750 

.750 

9.1 

0 

1 

12 

3 

.800 

.200 

-.800 

.800 

9.2 


(2 

0 

12 

2 

1.714 

.286 

.286 

-.286 T 

9.8 


1 

0 

10 

2 

.833 

.167 

.167 

-.167 

9.9 


Lo 

1 

9 

2 

.818 

.182 

-.818 

.818 J 

10.0 

1 

0 

9 

1 

.900 

.100 

.100 

-.100 

10.2 

1 

0 

8 

1 

.888 

.112 

.112 

-.112 

10.7 

1 

0 

7 

1 

.875 

.125 

.125 

-.125 

11.0 

1 

0 

6 

1 

.857 

.143 

.143 

-.143 

11.1 

1 

0 

5 

1 

.833 

.167 

.167 

-.167 

11.4 

0 

1 

4 

1 

.800 

.200 

-.800 

.800 

11.7 

1 

0 

4 

0 

1.000 

.000 

.000 

.000 

Totals 

22 

25 



30.79 

16.21 

f—8.690 

8.690 
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d. The log-rank statistic can be computed from the 
totals of the expanded table using the formulae: 

, , . . (Oi-Ei) 2 

log-rank statistic = - - 

Var(0, - Ei) 

Var(0/ - Ei) 

n\jn 2 j(mij + m 2j )(n Xj +n 2j -m Xj -m 2j ) 


= E 


(nij + n 2 j) 2 (n\j +n 2j - 1) 


The variance turns out to be 9.448, so that the 
log-rank statistic is (8.69) 2 /9.448 = 7.993. 

Using Stata, the results for the log-rank test are 
given as follows: 

Events Events 
Group observed expected 


22 

25 


30.79 

16.21 


Total 


47 


47.00 


Log-rank = chi2(2) = 7.99 
P-value = Pr > chi2 = 0.0047 

The log-rank test gives highly significant results. 
This indicates that there is a significant difference in 
survival between the two groups. 

a. For the Anderson dataset, the KM plots for the three 
categories of log WBC are shown below: 



b. The KM curves are quite different with group 1 
having consistently better survival prognosis than 
group 2, and group 2 having consistently better 
survival prognosis than group 3. Note also that the 
difference between group 1 and 2 is about the same 
over time, whereas group 2 appears to diverge from 
group 3 as time increases. 
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c. The log-rank statistic (26.391) is highly significant 
with P-values equal to zero to three decimal places. 
These results indicate that there is some overall 
difference between the three curves. 


Appendix: 
Matrix 
Formula 
for the 
Log—Rank 
Statistic for 
Several 
Groups 


For i = 1,2,... ,G and j = 1,2,... ,k, where G = # of 
groups and k = # of distinct failure times, 

mj = # at risk in ith group at ;th ordered failure time 

m, ; = observed # of failures in ith group at/th ordered failure 
time 

eij = expected # of failures in ith group at/th ordered failure 
time 


riij 


n\j +n 2 j 


(mij +m 2 j) 


/ = 

i=i 

G 

l i = J2 mi i 


/=i 


0/ E( — ^ ' (wijj &ij) 

7=1 


Var(0 ; -Ei) = J2 


7=1 


riijirij - nij)mj(rij - mj) 

n)(rij - 1 ) 


—riijnijmj(nj — mj) 


Co v(0,- £ „ 0 ,- £ ,) = E 

d = (Oi — E\,0 2 — E 2 ,..., Og-i — £g— iX 
V = ((r,/)) 


where u„- = Var (0, — £, ) and u,/ = Cov (0,- — £,. 0; — £)) 
for i = 1,2, ...,G - 1; / = 1,2, ...,G - 1. 

Then, the log-rank statistic is given by the matrix product 
formula: 

Log-rank statistic = d V 1 d 

which has approximately a chi-square distribution with 
G — 1 degrees of freedom under the null hypothesis that all 
G groups have a common survival curve. 
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Introduction 


Abbreviated 

Outline 


We begin by discussing some computer results using the Cox 
PH model, without actually specifying the model; the purpose 
here is to show the similarity between the Cox model and 
standard linear regression or logistic regression. 

We then introduce the Cox model and describe why it is so 
popular. In addition, we describe its basic properties, includ¬ 
ing the meaning of the proportional hazards assumption and 
the Cox likelihood. 


The outline below gives the user a preview of the material to 
be covered by the presentation. A detailed outline for review 
purposes follows the presentation. 

I. A computer example using the Cox PH model 
(pages 86-94) 

II. The formula for the Cox PH model (pages 94-96) 

III. Why the Cox PH model is popular (pages 96-98) 

IV. ML estimation of the Cox PH model 
(pages 98-100) 

V. Computing the hazard ratio (pages 100-103) 

VI. Adjusted survival curves using the Cox PH model 
(pages 103-107) 

VII. The meaning of the PH assumption 
(pages 107-111) 

VIII. The Cox likelihood (pages 111-115) 

IX. Summary (pages 115-116) 
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Objectives 


Upon completing this chapter, the learner should be able to: 

1. State or recognize the general form of the Cox PH model. 

2. State the specific form of a Cox PH model appropriate for 
the analysis, given a survival analysis scenario involving 
one or more explanatory variables. 

3. State or recognize the form and properties of the baseline 
hazard function in the Cox PH model. 

4. Give three reasons for the popularity of the Cox PH 
model. 

5. State the formula for a designated hazard ratio of interest 
given a scenario describing a survival analysis using a 
Cox PH model, when 

a. there are confounders but no interaction terms in the 
model; 

b. there are both confounders and interaction terms in 
the model. 

6. State or recognize the meaning of the PH assumption. 

7. Determine and explain whether the PH assumption is 
satisfied when the graphs of the hazard functions for two 
groups cross each other over time. 

8. State or recognize what is an adjusted survival curve. 

9. Compare and/or interpret two or more adjusted survival 
curves. 

10. Given a computer printout involving one or more fitted 
Cox PH models, 

a. compute or identify any hazard ratio(s) of interest; 

b. carry out and interpret a designated test of 
hypothesis; 

c. carry out, identify or interpret a confidence interval 
for a designated hazard ratio; 

d. evaluate interaction and confounding involving one 
or more covariates. 

11. Give an example of how the Cox PH likelihood is formed. 
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Presentation 



This presentation describes the Cox proportional 
hazards (PH) model, a popular mathematical 
model used for analyzing survival data. Here, we 
focus on the model form, why the model is pop¬ 
ular, maximum likelihood (ML) estimation of the 
model parameters, the formula for the hazard ra¬ 
tio, how to obtain adjusted survival curves, and 
the meaning of the PH assumption. 


I. A Computer Example Using 
the Cox PH Model 


EXAMPLE 


Leukemia Remission Data 


Group 1 (n = 21) 

Group 2 (n = 21) 

£(weeks) 

log WBC 

?(weeks) 

log WBC 

6 

2.31 

i 

2.80 

6 

4.06 

i 

5.00 

6 

3.28 

2 

4.91 

7 

4.43 

2 

4.48 

10 

2.96 

3 

4.01 

13 

2.88 

4 

4.36 

16 

3.60 

4 

2.42 

22 

2.32 

5 

3.49 

23 

2.57 

5 

3.97 

6+ 

3.20 

8 

3.52 

9+ 

2.80 

8 

3.05 

10+ 

2.70 

8 

2.32 

11+ 

2.60 

8 

3.26 

17+ 

2.16 

11 

3.49 

19+ 

2.05 

11 

2.12 

20+ 

2.01 

12 

1.50 

25+ 

1.78 

12 

3.06 

32+ 

2.20 

15 

2.30 

32+ 

2.53 

17 

2.95 

34+ 

1.47 

22 

2.73 

35+ 

1.45 

23 

1.97 


+ denotes censored observation 


We introduce the Cox PH model using computer 
output from the analysis of remission time data 
(Freireich et ah, Blood, 1963), which we previously 
discussed in Chapters 1 and 2. The data set is listed 
here at the left. 

These data involve two groups of leukemia pa¬ 
tients, with 21 patients in each group. Group 1 
is the treatment group, and group 2 is the placebo 
group. The data set also contains the variable log 
WBC, which is a well-known prognostic indicator 
of survival for leukemia patients. 

For this example, the basic question of interest 
concerns comparing the survival experience of the 
two groups adjusting for the possible confounding 
and/or interaction effects of log WBC. 
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We are thus considering a problem involving two 
explanatory variables as predictors of survival 
time T, where T denotes “weeks until going 
out of remission.” We label the explanatory 
variables X\ (for group status) and X 2 (for log 
WBC). The variable X\ is the primary study or 
exposure variable of interest. The variable X 2 is 
an extraneous variable that we are including as a 
possible confounder or effect modifier. 

Note that if we want to evaluate the possi¬ 
ble interaction effect of log WBC on group status, 
we would also need to consider a third variable, 
that is, the product of X\ and X 2 . 

For this dataset, the computer results from 
fitting three different Cox proportional hazards 
models are presented below. The computer 
package used is Stata. This is one of several 
packages that have procedures for carrying out 
a survival analysis using the Cox model. The 
information printed out by different packages 
will not have exactly the same format, but they 
will provide similar information. A comparison 
of output using Stata, SAS, and SPSS procedures 
on the same dataset is provided in the computer 
Edited output from Stata: appendix at the back of this text. 

Model 1: 



Coef. 

Std. Err. 

z 

P > |z| 

Haz. Ratio [95% Conf. Interval] 

Rx 

1.509 

0.410 

3.68 

0.000 

4.523 

2.027 10.094 

No. of subjects 

= 42 

Log likelihood = 

-86.380 

Prob > 

chi2 = 0.0001 

Model 2: 

Coef. 

Std. Err. 

z 

P > |z| 

Haz. Ratio [95% Conf. Interval] 

Rx 

1.294 

0.422 

3.07 

0.002 

3.648 

1.595 8.343 

log WBC 

1.604 

0.329 

4.87 

0.000 

4.975 

2.609 9.486 

No. of subjects 

= 42 

Log likelihood = 

-72.280 

Prob > 

chi2 = 0.0000 

Model 3: 

Coef. 

Std. Err. 

z 

P > |z| 

Haz. Ratio [95% Conf. Interval] 

Rx 

2.355 

1.681 

1.40 

0.161 

10.537 

0.391 284.201 

log WBC 

1.803 

0.447 

4.04 

0.000 

6.067 

2.528 14.561 

Rx x log WBC 

-0.342 

0.520 

-0.66 

0.510 

0.710 

0.256 1.967 

No. of subjects 

= 42 

Log likelihood = 

-72.066 

Prob > 

chi2 = 0.0000 


EXAMPLE (continued) 


T = weeks until going out of remission 

X 1 = group status = E 

X 2 = log WBC (confounding?) 


Interaction? 

X 3 =X l xX 2 = group status x log WBC 


Computer results for three Cox PH 
models using the Stata package 

Other computer packages provide 
similar information. 

Computer Appendix: uses Stata, SAS, 
and SPSS on the same dataset. 











88 3. The Cox Proportional Hazards Model and Its Characteristics 


1 EDITED OUTPUT FROM STATA 1 

Model 1: 

Coef. 

Std. Err. p > Izl Haz. Ratio 

Rx 

1.509 

0.410 0.000 4.523 


No. of subjects = 42 Log likelihood = -86.380 

Hazard ratios 


Model 2: 


Coef. 

Std. Err. 

p > Izl 

Haz. Ratio 

Rx 1.294 

0.422 

0.002 

3.648 

log WBC 1.604 

0.329 

0.000 

4.975 

No. of subjects = 42 Log likelihood 

= -72.280 

Model 3: 




Coef. 

Std. Err. 

p > Izl 

Haz. Ratio 

Rx 2.355 

1.681 

0.161 

10.537 

log WBC 1.803 

0.447 

0.000 

6.067 

Rx x log WBC -0.342 

0.520 

0.510 

0.710 


No. of subjects = 42 Log likelihood = -72.066 



We now describe how to use the computer 
printout to evaluate the possible effect of treat¬ 
ment status on remission time adjusted for the 
potential confounding and interaction effects of 
the covariate log WBC. For now, we focus only 
on five columns of information provided in the 
printout, as presented at the left for all three 
models. 

For each model, the first column identifies 
the variables that have been included in the 
model. The second column gives estimates of 
regression coefficients corresponding to each 
variable in the model. The third column gives 
standard errors of the estimated regression 
coefficients. The fourth column gives p-values for 
testing the significance of each coefficient. The 
fifth column, labeled as Haz. Ratio, gives hazard 
ratios for the effect of each variable adjusted for 
the other variables in the model. 


Except for the Haz. Ratio column, these 
computer results are typical of output found 
in standard linear regression printouts. As the 
printout suggests, we can analyze the results from 
a Cox model in a manner similar to the way we 
would analyze a linear regression model. 


EXAMPLE (continued) 


Same dataset for each model 
n = 42 subjects 

T = time (weeks) until out of remission 

Model 1: Rx only 

Model 2: Rx and log WBC 

Model 3: Rx, log WBC, and 
Rx x log WBC 


We now distinguish among the output for the three 
models shown here. All three models are using the 
same remission time data on 42 subjects. The out¬ 
come varible for each model is the same—time 
in weeks until a subject goes out of remission. 
However, the independent variables are different 
for each model. Model 1 contains only the treat¬ 
ment status variable, indicating whether a sub¬ 
ject is in the treatment or placebo group. Model 2 
contains two variables—treatment status and log 
WBC. And model 3 contains an interaction term 
defined as the product of treatment status and log 
WBC. 
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EDITED OUTPUT: ML ESTIMATION 


Model 3: 

Coef. 

Std. Err. 

p > Izl 

Haz. Ratio 

Rx 

2.355 

1.681 

0.161 

10.537 

log WBC 

1.803 

0.447 

0.000 

6.067 

Rx x log WBC 

-0.342 

0.520 

(0.510) 

0.710 

No. of subjects 

= 42 |Log likelihood = -72.066 | 


We now focus on the output for model 3. The 
method of estimation used to obtain the coeffi¬ 
cients for this model, as well as the other two mod¬ 
els, is maximum likelihood (ML) estimation. Note 
that a p-value of 0.510 is obtained for the coef¬ 
ficient of the product term for the interaction of 
treatment with log WBC. This p-value indicates 
that there is no significant interaction effect, so 
that we can drop the product term from the model 
and consider the other two models instead. 


EXAMPLE (continued) 


P = 0.510:—fjdd^_ = _o.66 =Z Wald statistic 

LR statistic: uses Log likelihood = -72.066 

-2 In L (log likelihood statistic) = -2 x (—72.066) 
= 144.132 


The p-value of 0.510 that we have just described is 
obtained by dividing the coefficient —0.342 of the 
product term by its standard error of 0.520, which 
gives —0.66, and then assuming that this quantity 
is approximately a standard normal or Z variable. 
This Z statistic is known as a Wald statistic, which 
is one of two test statistics typically used with ML 
estimates. The other test statistic, called the like¬ 
lihood ratio, or LR statistic, makes use of the 
log likelihood statistic. The log likelihood statistic 
is obtained by multiplying the “Log likelihood” in 
the Stata output by —2 to get —2 In L. 



We now look at the printout for model 2, which 
contains two variables. The treatment status vari¬ 
able ( Rx ) represents the exposure variable of pri¬ 
mary interest. The log WBC variable is being con¬ 
sidered as a confounder. Our goal is to describe the 
effect of treatment status adjusted for log WBC. 


EXAMPLE (continued) 


LR (interaction in model 3) 

^ In t, mo( ] G | 2 — ( — 2 In Cn 0t j c | 3 ) 

In general: 

LR = -2 In L r - (-2 In L p ) 


To use the likelihood ratio (LR) statistic to test 
the significance of the interaction term, we need 
to compute the difference between the log like¬ 
lihood statistic of the reduced model which does 
not contain the interaction term (model 2) and the 
log likelihood statistic of the full model containing 
the interaction term (model 3). In general, the LR 
statistic can be written in the form —2 In L /; minus 
—2 In Lf, where R denotes the reduced model and 
F denotes the full model. 
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EXAMPLE (continued) 


LR (interaction in model 3) 

2 In C mo{ | ( .| 2 — (-2 In L model 3 ) 
= (-2 x -72.280) - (-2 x -72.066) 
= 144.550- 144.132 = 0.428 

{LR is x 2 with 1 d.f. under H 0 : 
no interaction.) 

0.40 < P < 0.50, not significant 
Wald test P = 0.510 


To obtain the LR statistic in this example, we com¬ 
pute 144.550 minus 144.132 to obtain 0.428. Un¬ 
der the null hypothesis of no interaction effect, 
the test statistic has a chi-square distribution with 
p degrees of freedom, where p denotes the num¬ 
ber of predictors being assessed. The p-value for 
this test is between 0.40 and 0.50, which indicates 
no significant interaction. Although the p-values 
for the Wald test (0.510) and the LR test are not 
exactly the same, both p-values lead to the same 
conclusion. 


LR ± Wald 

When in doubt, use the LR test. 


In general, the LR and Wald statistics may not give 
exactly the same answer. Statisticians have shown 
that of the two test procedures, the LR statistic has 
better statistical properties, so when in doubt, you 
should use the LR test. 


OUTPUT 

Model 2: 

Coef. 

Std. Err. 

p > Izl Haz. Ratio 

Rx 

(094) 

0.422 

07002) (£63E) 

log WBC 

1.604 

0.329 

0.000 4.975 

No. of subjects = 42 

Log likelihood = -72.280 


Three statistical objectives. 


1. test for significance of effect 

2. point estimate of effect 

3. confidence interval for effect 


EXAMPLE (continued) 


Test for treatment effect: 

Wald statistic: P = 0.002 (highly 

significant) 


LR statistic: compare 

-2 log L from model 2 with 
-2 log L from model without Rx 
^variable 

Printout not provided here 


Conclusion: treatment effect is signifi¬ 
cant, after adjusting for log WBC 


We now focus on how to assess the effect of 
treatment status adjusting for log WBC using the 
model 2 output, again shown here. 


There are three statistical objectives typically 
considered. One is to test for the significance 
of the treatment status variable, adjusted for 
log WBC. Another is to obtain a point estimate 
of the effect of treatment status, adjusted for 
log WBC. And a third is to obtain a confidence 
interval for this effect. We can accomplish 
these three objectives using the output provided, 
without having to explicitly describe the formula 
for the Cox model being used. 

To test for the significance of the treatment 
effect, the p-value provided in the table for the 
Wald statistic is 0.002, which is highly significant. 
Alternatively, a likelihood ratio (LR) test could 
be performed by comparing the log likelihood 
statistic (144.559) for model 2 with the log 
likelihood statistic for a model which does not 
contain the treatment variable. This latter model, 
which should contain only the log WBC variable, 
is not provided here, so we will not report on it 
other than to note that the LR test is also very 
significant. Thus, these test results show that 
using model 2, the treatment effect is significant, 
after adjusting for log WBC. 
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EXAMPLE (continued) 


Point estimate: 

HR = 3.648 

= „1.294 

V 

Coefficient of treatment variable 


A point estimate of the effect of the treatment is 
provided in the HR column by the value 3.648. 
This value gives the estimated hazard ratio (HR) 
for the effect of the treatment; in particular, 
we see that the hazard for the placebo group is 
3.6 times the hazard for the treatment group. 
Note that the value 3.648 is calculated as e to the 
coefficient of the treatment variable; that is, e to 
the 1.294 equals 3.648. 


To describe the confidence interval for the ef¬ 
fect of treatment status, we consider the output 
for the extended table for model 2 given earlier. 



EXAMPLE (continued) 


95% confidence interval for the HR: 
(1.595, 8.343) 


—(- -j- - )— 

1.595 3 ' 648 8.343 


95% Cl for p i: 1.294 ± (1.96) (0.422) 



95% Cl for HR = ePi: 

exp[(S[ ± 1.96*^] = el-294± 1.96(0.422) 


From the table, we see that a 95% confidence 
interval for the treatment effect is given by the 
range of values 1.595-8.343. This is a confidence 
interval for the hazard ratio (HR), which sur¬ 
rounds the point estimate of 3.648 previously 
described. Notice that this confidence interval is 
fairly wide, indicating that the point estimate is 
somewhat unreliable. As expected from the low 
p-value of 0.002, the confidence interval for HR 
does not contain the null value of 1. 

The calculation of the confidence interval 
for HR is carried out as follows: 

1. Compute a 95% confidence interval for the re¬ 
gression coefficient of the Rx variable ((3,). The 
large sample formula is 1.294 plus or minus 
1.96 times the standard error 0.422, where 1.96 
is the 97.5 percentile of the standard normal or 
Z distribution. 

2. Exponentiate the two limits obtained for the 
confidence interval for the regression coeffi¬ 
cient of Rx. 
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Stata: provides Cl directly The Stata output provides the required confidence 

interval directly, so that the user does not have to 
Other packages: provide (3 s and s^'s carry out the computations required by the large 

sample formula. Other computer packages may 
not provide the confidence interval directly, but, 
rather, may provide only the estimated regression 
coefficients and their standard errors. 


EDITED OUTPUT 


Model 1: 

Coef. 

Std. Err. p > Izl 

Haz. Ratio 

Rx 

1.509 

0.410 0.000 

143231 

No. of subjects 

; = 42 

Log likelihood = -86.380 

Model 2: 

Coef. 

Std. Err. p > Izl 

Haz. Ratio 

Rx 

1.294 

0.422 0.002 

13.6481 

log WBC 

1.604 

0.329 0.000 

4.975 

No. of subjects 

; = 42 

Log likelihood = -72.280 


To this point, we have made use of information 
from outputs for models 2 and 3, but have not yet 
considered the model 1 output, which is shown 
again here. Note that model 1 contains only the 
treatment status variable, whereas model 2, shown 
below, contains log WBC in addition to treatment 
status. Model 1 is sometimes called the “crude” 
model because it ignores the effect of potential co¬ 
variates of interest, like log WBC. 


EXAMPLE (continued) 


HR for model 1 (4.523) is higher than 
HR for model 2 (3.648). 


Confounding: crude versus adjusted 
firm’s are meaningfully different. 


Confounding due to log WBC 
=> must control for log WBC, i.e., 
prefer model 2 to model 1. 


If no confounding, then consider preci¬ 
sion: e.g., if 95% Cl is narrower for 
model 2 than model 1, we prefer model 2. 


Model 1 can be used in comparison with model 2 
to evaluate the potential confounding effect of the 
variable log WBC. In particular, notice that the 
value in the HR column for the treatment status 
variable is 4.523 for model 1, but only 3.648 for 
model 2. Thus, the crude model yields an esti¬ 
mated hazard ratio that is somewhat higher than 
the corresponding estimate obtained when we ad¬ 
just for log WBC. If we decide that the crude and 
adjusted estimates are meaningfully different, we 
then say that there is confounding due to log WBC. 

Once we decide that confounding is present, we 
then must control for the confounder—in this 
case, log WBC—in order to obtain a valid estimate 
of the effect. Thus, we prefer model 2, which 
controls for log WBC, to model 1, which does not. 

Note that if we had decided that there is no “mean¬ 
ingful” confounding, then we would not need to 
control for log WBC to get a valid answer. Nev¬ 
ertheless, we might wish to control for log WBC 
anyhow, to obtain a more precise estimate of the 
hazard ratio. That is, if the confidence interval for 
the HR is narrower when using model 2 than when 
using model 1, we would prefer model 2 to model 1 
for precision gain. 
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EDITED OUTPUT: Confidence Intervals 

[95% Conf. Interval] 

Rx model 1 

2.027 10.094 



width = 8.067 



width = 6.748 


Rx model 2 

1.595 8.343 


log WBC 

2.609 9.486 




The confidence intervals for Rx in each model 
are shown here at the left. The interval for Rx in 
model 1 has width equal to 10.094 minus 2.021, or 
8 . 067 ; for model 2, the width is 8.343 minus 1.595, 
or 6 . 748 . Therefore, model 2 gives a more precise 
estimate of the hazard ratio than does model 1. 


EXAMPLE (continued) 


Model 2 is best model. 

HR = 3.648 statistically significant 
95% Cl for HR: (1.6, 8.3) 


Our analysis of the output for the three models 
has led us to conclude that model 2 is the best of 
the three models and that, using model 2, we get 
a statistically significant hazard ratio of 3.648 for 
the effect of the treatment, with a 95% confidence 
interval ranging between 1.6 and 8.3. 


Cox model formulae not specified 

Analysis strategy and methods for 
Cox model analogous to those for 
logistic and classical linear models. 


EXAMPLE (continued) 


Survival Curves Adjusted for log WBC 
(Model 2) 

S(f) 



Note that we were able to carry out this analysis 
without actually specifying the formulae for the 
Cox PH models being fit. Also, the strategy and 
methods used with the output provided have been 
completely analogous to the strategy and methods 
one uses when fitting logistic regression models 
(see Kleinbaum and Klein, Logistic Regression, 
Chapters 6 and 7, 2002), and very similar to 
carrying out a classical linear regression analysis 
(see Kleinbaum et ah, Applied Regression Analysis, 
3rded., Chapter 16, 1997). 

In addition to the above analysis of this 
data, we can also obtain survival curves for each 
treatment group, adjusted for the effects of log 
WBC and based on the model 2 output. Such 
curves, sketched here at the left, give additional 
information to that provided by estimates and 
tests about the hazard ratio. In particular, these 
curves describe how the treatment groups com¬ 
pare over the time period of the study. 


For these data, the survival curves show 
that the treatment group consistently has higher 
survival probabilities than the placebo group after 
adjusting for log WBC. Moreover, the difference 
between the two groups appears to widen over 
time. 
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Adjusted survival 
curves 

KM curves 

Adjusted for 

No covariates 

covariates 


Use fitted Cox 

No Cox model 

model 

fitted 

Remainder: 



• Cox model formula 

• basic characteristics of Cox 
model 

• meaning of PH assumption 


Note that adjusted survival curves are mathemat¬ 
ically different from Kaplan-Meier (KM) curves. 
KM curves do not adjust for covariates and, there¬ 
fore, are not computed using results from a fitted 
Cox PH model. 

Nevertheless, for these data, the plotted KM curves 
(which were described in Chapter 2) are similar in 
appearance to the adjusted survival curves. 

In the remainder of this presentation, we describe 
the Cox PH formula and its basic characteristics, 
including the meaning of the PH assumption and 
the Cox likelihood. 


II. The Formula for the 
Cox PH Model 


E p iXi 

h(t,X) = h 0 (t)e™ 

X = (X U X 2 ,...,X P ) 

explanatory/predictor variables 


ho(t) x 


E ft* 


e i=i 


Baseline 

Exponential 

hazard 


Involves t 

Involves X's but 

but not 

not t (X’s are 

X’s 

time-independent) 


The Cox PH model is usually written in terms 
of the hazard model formula shown here at the 
left. This model gives an expression for the hazard 
at time t for an individual with a given specifica¬ 
tion of a set of explanatory variables denoted by 
the bold X. That is, the bold X represents a col¬ 
lection (sometimes called a “vector”) of predictor 
variables that is being modeled to predict an indi¬ 
vidual’s hazard. 

The Cox model formula says that the hazard at 
time t is the product of two quantities. The first 
of these, ho(t), is called the baseline hazard 
function. The second quantity is the exponential 
expression e to the linear sum of (3,X, , where the 
sum is over the p explanatory X variables. 


An important feature of this formula, which 
concerns the proportional hazards (PH) assump¬ 
tion, is that the baseline hazard is a function of 
t, but does not involve the X's. In contrast, the 
exponential expression shown here, involves the 
X's, but does not involve t. The X’s here are called 
time-independent X's. 
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A's involving t : time-dependent 

Requires extended Cox model (no 
PH) 


Time-dependent variables: 
Chapter 6 


Time-independent variable: 

Values for a given individual 
do not change over time; e.g., 

SEX and SMK 

\ 

Assumed not to change 
once measured 

AGE and WGT values do not change 
much, or effect on survival depends 
on one measurement. 


Xi = X 2 = ■ ■ ■ = X p = 0 

P 

h(t,X) = h 0 (t)e ■= 

= h 0 (t)e° 

= hod) 

Baseline hazard 

No I’s in model: h(t,X) = ho(t). 



h 0 (t) is unspecified. 

Cox model: semiparametric 


It is possible, nevertheless, to consider X’s which 
do involve t. Such Z's are called time-dependent 
variables. If time-dependent variables are consid¬ 
ered, the Cox model form may still be used, but 
such a model no longer satisfies the PH assump¬ 
tion, and is called the extended Cox model. 

The use of time-dependent variables is discussed 
in Chapter 6. For the remainder of this presenta¬ 
tion, we will consider time-independent I’s only. 

A time-independent variable is defined to be any 
variable whose value for a given individual does 
not change over time. Examples are SEX and 
smoking status (SMK). Note, however, that a per¬ 
son’s smoking status may actually change over 
time, but for purposes of the analysis, the SMK 
variable is assumed not to change once it is mea¬ 
sured, so that only one value per individual is used. 

Also note that although variables like AGE and 
weight (WGT) change over time, it may be appro¬ 
priate to treat such variables as time-independent 
in the analysis if their values do not change much 
over time or if the effect of such variables on sur¬ 
vival risk depends essentially on the value at only 
one measurement. 

The Cox model formula has the property that if 
all the I’s are equal to zero, the formula reduces 
to the baseline hazard function. That is, the expo¬ 
nential part of the formula becomes e to the zero, 
which is 1. This property of the Cox model is the 
reason why ho(t) is called the baseline function. 


Or, from a slightly different perspective, the Cox 
model reduces to the baseline hazard when no A's 
are in the model. Thus, ho(t) may be considered 
as a starting or “baseline” version of the hazard 
function, prior to considering any of the A’s. 

Another important property of the Cox model is 
that the baseline hazard, ho(t), is an unspecified 
function. It is this property that makes the Cox 
model a semiparametric model. 
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In contrast, a parametric model is one whose 
functional form is completely specified, except for 
the values of the unknown parameters. For ex¬ 
ample, the Weibull hazard model is a parametric 
model and has the form shown here, where the 
unknown parameters are A, p, and the (3,’s. Note 
that for the Weibull model, ho(t) is given by Api p_1 
(see Chapter 7). 

Semiparametric property One of the reasons why the Cox model is so popu- 

JJ, lar is that it is semiparametric. We discuss this and 

Popularity of the Cox model other reasons in the next section (III) concerning 

why the Cox model is so widely used. 


EXAMPLE: Parametric Model 


Weibull: 
h(t, X) = TipfP- 1 

p 

where X = exp[Zp,xJ 

i=l 

and h 0 (t) = pfP -1 


III. Why the Cox PH Model 
Is Popular 

Cox PH model is “robust”: Will 
closely approximate correct 
parametric model 


A key reason for the popularity of the Cox model is 
that, even though the baseline hazard is not spec¬ 
ified, reasonably good estimates of regression co¬ 
efficients, hazard ratios of interest, and adjusted 
survival curves can be obtained for a wide variety 
of data situations. Another way of saying this is 
that the Cox PH model is a “robust” model, so that 
the results from using the Cox model will closely 
approximate the results for the correct parametric 
model. 


If correct model is: 

Cox model will 
Weibull =>• approximate 
Weibull 

Cox model will 
Exponential => approximate 
exponential 


For example, if the correct parametric model is 
Weibull, then use of the Cox model typically will 
give results comparable to those obtained using a 
Weibull model. Or, if the correct model is expo¬ 
nential, then the Cox model results will closely ap¬ 
proximate the results from fitting an exponential 
model. 


Prefer parametric model if sure of 
correct model, e.g., use goodness- 
of-fit test (Lee, 1982). 


We would prefer to use a parametric model if we 
were sure of the correct model. Although there are 
various methods for assessing goodness of fit of a 
parametric model (for example, see Lee, Statistical 
Methods for Survival Data Analysis, 1982), we may 
not be completely certain that a given parametric 
model is appropriate. 
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When in doubt, the Cox model is a Thus, when in doubt, as is typically the case, the 
“safe” choice. Cox model will give reliable enough results so that 

it is a “safe” choice of model, and the user does 
not need to worry about whether the wrong para¬ 
metric model is chosen. 

In addition to the general “robustness” of the Cox 
model, the specific form of the model is attractive 
for several reasons. 


E 

h(t,X) = ho(t) x e i=1 


Baseline 

Exponential 

hazard 

ti 


0 < h(t,X) < oo always 


As described previously, the specific form of the 
Cox model gives the hazard function as a product 
of a baseline hazard involving t and an exponential 
expression involving the X’s without t. The expo¬ 
nential part of this product is appealing because 
it ensures that the fitted model will always give 
estimated hazards that are non-negative. 


h 0 (t) x ^2 &i X ‘ 
i=1 


Linear 

ff 

Might be < 0 


We want such nonnegative estimates because, by 
definition, the values of any hazard function must 
range between zero and plus infinity, that is, a haz¬ 
ard is always nonnegative. If, instead of an expo¬ 
nential expression, the X part of the model were, 
for example, linear in the X’s, we might obtain neg¬ 
ative hazard estimates, which are not allowed. 


Even though ho(t) is unspecified, 
we can estimate the (3 s. 

Measure of effect: hazard ratio (HR) 
involves only (3 s, without 
estimating ho(t). 


Another appealing property of the Cox model is 
that, even though the baseline hazard part of the 
model is unspecified, it is still possible to estimate 
the (3’s in the exponential part of the model. As 
we will show later, all we need are estimates of the 
(3’s to assess the effect of explanatory variables of 
interest. The measure of effect, which is called a 
hazard ratio, is calculated without having to esti¬ 
mate the baseline hazard function. 


Can estimate h(t,X) and S(t,X) Note that the hazard function h(t .X) and its corre- 

for Cox model using a minimum sponding survival curves S(t,X ) can be estimated 

of assumptions. for the Cox model even though the baseline haz¬ 

ard function is not specified. Thus, with the Cox 
model, using a minimum of assumptions, we can 
obtain the primary information desired from a 
survival analysis, namely, a hazard ratio and a sur¬ 
vival curve. 
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Cox model preferred to logistic model. 


ft 

Uses survival 
times and 
censoring 


ft 

Uses (0,1) outcome; 
ignores survival times 
and censoring 


One last point about the popularity of the Cox 
model is that it is preferred over the logistic model 
when survival time information is available and 
there is censoring. That is, the Cox model uses 
more information—the survival times—than the 
logistic model, which considers a (0,1) outcome 
and ignores survival times and censoring. 


IV. ML Estimation of the 
Cox PH Model 

E [3;X; 

h(t,X)=h 0 (t)e& 


We now describe how estimates are obtained for 
the parameters of the Cox model. The parame¬ 
ters are the |3’s in the general Cox model formula 
shown here. The corresponding estimates of these 
parameters are called maximum likelihood (ML) 
estimates and are denoted as |3 ; “hat.” 


ML estimates: (3, 



Coef. 

Std.Err. 

P > Ul 

Haz. Ratio 

Rx 

1.294 

0.422 

0.002 

3.648 

log WBC 

1.604 

0.329 

0.000 

4.975 


No. of subjects = 42 Log likelihood = -72.280 


Estimated model: 

h(t,X) = h 0 (t)e 1 -294 +1.604 log WBC 


As an example of ML estimates, we consider once 
again the computer output for one of the models 
(model 2) fitted previously from remission data 
on 42 eukemia patients. 

The Cox model for this example involves 
two parameters, one being the coefficient of 
the treatment variable (denoted here as Rx) 
and the other being the coefficient of the log 
WBC variable. The expression for this model is 
shown at the left, which contains the estimated 
coefficients 1.294 for Rx and 1.604 for log white 
blood cell count. 


ML estimates: maximize likelihood 
function L 


L = joint probability of observed 
data = L(|3) 


As with logistic regression, the ML estimates of 
the Cox model parameters are derived by maxi¬ 
mizing a likelihood function, usually denoted as 
L. The likelihood function is a mathematical ex¬ 
pression which describes the joint probability of 
obtaining the data actually observed on the sub¬ 
jects in the study as a function of the unknown pa¬ 
rameters (the (3’s) in the model being considered. 
L is sometimes written notationally as L( |3) where 
(3 denotes the collection of unknown parameters. 


The expression for the likelihood is developed at 
the end of the chapter. However, we give a brief 
overview below. 
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L is a partial likelihood: 

• considers probabilities only 
for subjects who fail 

• does not consider probabilities 
for subjects who are censored 


Number of failure times 

\ 

k 

L = Li x L 2 x L 3 x ■ ■ ■ x L k = ]~[ L ) 

;=i 

where 

L j = portion of L for the /th failure 
time given the risk set R(t(j)) 


The formula for the Cox model likelihood func¬ 
tion is actually called a “partial” likelihood func¬ 
tion rather than a (complete) likelihood function. 
The term “partial” likelihood is used because the 
likelihood formula considers probabilities only for 
those subjects who fail, and does not explicitly 
consider probabilities for those subjects who are 
censored. Thus the likelihood for the Cox model 
does not consider probabilities for all subjects, 
and so it is called a “partial” likelihood. 

In particular, the partial likelihood can be written 
as the product of several likelihoods, one for each 
of, say, k failure times. Thus, at the/th failure time, 
Lj denotes the likelihood of failing at this time, 
given survival up to this time. Note that the set of 
individuals at risk at the /th failure time is called 
the “risk set,” R(t(j)), and this set will change— 
actually get smaller in size—as the failure time in¬ 
creases. 


Information on censored subjects 
used prior to censorship. 

Lj uses in Censored later 



Thus, although the partial likelihood focuses on 
subjects who fail, survival time information prior 
to censorship is used for those subjects who are 
censored. That is, a person who is censored after 
the /th failure time is part of the risk set used to 
compute L ; , even though this person is censored 
later. 


Steps for obtaining ML estimates: 

• form L from model 

• maximize In L by solving 


din L 



i = 1, ..., p(# of parameters) 


Once the likelihood function is formed for a given 
model, the next step for the computer is to maxi¬ 
mize this function. This is generally done by maxi¬ 
mizing the natural log of L, which is computation¬ 
ally easier. 


Solution by iteration: 

• guess at solution 

• modify guess in successive steps 

• stop when solution is obtained 


The maximization process is carried out by tak¬ 
ing partial derivatives of log of L with respect to 
each parameter in the model, and then solving a 
system of equations as shown here. This solution 
is carried out using iteration. That is, the solu¬ 
tion is obtained in a stepwise manner, which starts 
with a guessed value for the solution, and then 
successively modifies the guessed value until a so¬ 
lution is finally obtained. 
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Statistical inferences for hazard 
ratios: (See Section I, pages 86-94) 


Test hypotheses 

Confidence intervals 

Wald test 

LR test 

Large sample 95% Cl 


HR = e 13 for a (0,1) exposure 

variable (no interaction) 


Once the ML estimates are obtained, we are usu¬ 
ally interested in carrying out statistical inferences 
about hazard ratios defined in terms of these esti¬ 
mates. We illustrated previously how to test hy¬ 
potheses and form confidence intervals for the 
hazard ratio in Section I above. There, we de¬ 
scribed how to compute a Wald test and a likeli¬ 
hood ratio (LR) test. We also illustrated how to cal¬ 
culate a large sample 95% confidence interval for 
a hazard ratio. The estimated hazard ratio (HR) 
was computed by exponentiating the coefficient 
of a (0,1) exposure variable of interest. Note that 
the model contained no interaction terms involv¬ 
ing exposure. 


V. Computing the Hazard 
Ratio 


HR = 


h(t,X*) 

h(t,X) 


where 

X* = {X* l ,X*,---,X* p ) 

and 

X = (X U X 2 ,---,X P ) 

denote the set of X s for two 
individuals 


In general, a hazard ratio {HR) is defined as the 
hazard for one individual divided by the hazard for 
a different individual. The two individuals being 
compared can be distinguished by their values for 
the set of predictors, that is, the X's. 

We can write the hazard ratio as the estimate of 
h(t,X*) divided by the estimate of h(t,X), where 
X* denotes the set of predictors for one individual, 
and X denotes the set of predictors for the other 
individual. 


To interpret HR, want HR > 1, i.e., 

h(t,X*) >h(t,X). 

Typical coding: X*: group with 
larger h 
X: group with 
smaller h 


EXAMPLE: Remission Data 


X* = (X{, X 2 *. X' p ), where X\ = 1 

denotes placebo group. 

X = (Xj, X 2 ,..„ Xp), where Xj = 0 
denotes treatment group. 


Note that, as with an odds ratio, it is easier to in¬ 
terpret an HR that exceeds the null value of 1 than 
an HR that is less than 1. Thus, the X’s are typi¬ 
cally coded so that group with the larger harzard 
corresponds to X*, and the group with the smaller 
hazard corresponds to X. As an example, for the 
remission data described previously, the placebo 
group is coded as X\ = 1, and the treatment group 
is coded as X\ =0. 
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HR = 


h (t,X*) 
HtX) 


ho(t) 


E P,*? 

e ,=i 


ftotf) 


E P,*,- 
e i=l 


We now obtain an expression for the HR formula 
in terms of the regression coefficients by substi¬ 
tuting the Cox model formula into the numerator 
and denominator of the hazard ratio expression. 
This substitution is shown here. Notice that the 
only difference in the numerator and denomina¬ 
tor are the X*’s versus the X’s. Notice also that the 
baseline hazards will cancel out. 


g , E p „ 

— h 0 (t)e‘=' e Pi (*?-*,) 

HR = ---= e‘=' 

f . . E P.*,- 
h 0 (t)e i=l 


Using algebra involving exponentials, the hazard 
ratio formula simplifies to the exponential expres¬ 
sion shown here. Thus, the hazard ratio is com¬ 
puted by exponentiating the sum of each (3, “hat” 
times the difference between X* and X,. 



r p 

HR = exp 

E1W - *i) 


Li=l J 


An alternative way to write this formula, using ex¬ 
ponential notation, is shown here. We will now 
illustrate the use of this general formula through 
a few examples. 




Suppose, for example, there is only one X variable 
of interest, X\, which denotes (0,1) exposure 
status, so that p = 1. Then, the hazard ratio 
comparing exposed to unexposed persons is 
obtained by letting X* = 1 and X\ = 0 in the 
hazard ratio formula. The estimated hazard ratio 
then becomes e to the quantity (3, “hat” times 1 
minus 0, which simplifies to e to the [3, “hat.” 

Recall the remission data printout for model 1, 
which contains only the Rx variable, again shown 
here. Then the estimated hazard ratio is obtained 
by exponentiating the coefficient 1.509, which 
gives the value 4.523 shown in the HR column of 
the output. 

As a second example, consider the output for 
model 2, which contains two variables, the Rx vari¬ 
able and log WBC. Then to obtain the hazard ratio 
for the effect of the Rx variable adjusted for the log 
WBC variable, we let the vectors X* and X be de¬ 
fined asX* = (1, log WBC) and X = (0, log WBC). 
Here we assume that log WBC is the same for X* 
and X though unspecified. 
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EXAMPLE 2 (continued) 


HR = exptPj (Xj-Xj) + p, (X*-X 2 )] 

= exp[l.294(1-0) 

+ 1.604 (log WBC-log WBC)] 

= exp[ 1.294(1) + 1.604(0)]= (p»*) 


The estimated hazard ratio is then obtained by 
exponentiating the sum of two quantities, one 
involving the coefficient 1.294 of the Rx variable, 
and the other involving the coefficient 1.604 of the 
log WBC variable. Since the log WBC value is fixed, 
however, this portion of the exponential is zero, so 
that the resulting estimate is simply e to the 1.294. 


General rule: If X\ is a (0,1) 
exposure variable, then 

HR = e 131 (= effect of exposure 

adjusted for other X’s) 
provided no other X's are product 
terms involving exposure. 


This second example illustrates the general rule 
that the hazard ratio for the effect of a (0,1) ex¬ 
posure variable which adjusts for other variables 
is obtained by exponentiating the estimated coef¬ 
ficient of the exposure variable. This rule has the 
proviso that the model does not contain any prod¬ 
uct terms involving exposure. 



We now give a third example which illustrates 
how to compute a hazard ratio when the model 
does contain product terms. We consider the 
printout for model 3 of the remission data shown 
here. 

To obtain the hazard ratio for the effect of Rx 
adjusted for log WBC using model 3, we consider 
X* and X vectors which have three components, 
one for each variable in the model. The X* vector, 
which denotes a placebo subject, has components 
X; = 1,X* = log WBC and X* = 1 times log 
WBC. The X vector, which denotes a treated 
subject, has components X\ = 0, Xj = log WBC 
and X 3 = 0 times log WBC. Note again that, as 
with the previous example, the value for log WBC 
is treated as fixed, though unspecified. 

Using the general formula for the hazard ratio, 
we must now compute the exponential of the sum 
of three quantities, corresponding to the three 
variables in the model. Substituting the values 
from the printout and the values of the vectors X* 
and X into this formula, we obtain the exponen¬ 
tial expression shown here. Using algebra, this 
expression simplifies to the exponential of 2.355 
minus 0.342 times log WBC. 
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EXAMPLE 3 (continued) 

log 

WBC = 2: 



HR 

= exp[2.355 - 

0.342 

(2)] 


= £1-671 = 5.32 



log 

WBC = 4: 



HR 

= exp[2.355 - 

0.342 

(4)] 


= e 0.987 = 2.68 




In order to get a numerical value for the hazard 
ratio, we must specify a value for log WBC. For 
instance, if log WBC = 2, the estimated hazard 
ratio becomes 5.32, whereas if log WBC = 4, the 
estimated hazard ratio becomes 2.68. Thus, we get 
different hazard ratio values for different values of 
log WBC, which should make sense since log WBC 
is an effect modifier in model 3. 


General rule for (0,1) exposure 
variables when there are product 
terms: 


HR = exp 


P + fy W ; - 


where 

|3 = coefficient of E 
6 j = coefficient of E x W :i 


{HR does not contain coefficients 
of non-product terms) 


The example we have just described using model 
3 illustrates a general rule which states that the 
hazard ratio for the effect of a (0,1) exposure vari¬ 
able in a model which contains product terms in¬ 
volving this exposure with other As can be written 
as shown here. Note that (3 “hat” denotes the co¬ 
efficient of the exposure variable and the 5 “hats” 
are coefficients of product terms in the model of 
the form Ex W,. Also note that this formula does 
not contain coefficients of nonproduct terms other 
than those involving E. 



For model 3, (3 “hat” is the coefficient of the Rx 
variable, and there is only one 5 “hat” in the 
sum, which is the coefficient of the product term 
Rx x log WBC. Thus, there is only one W, namely 
Wi = log WBC. The hazard ratio formula for the 
effect of exposure is then given by exponentiating 
(3 “hat” plus 6i “hat” times log WBC. Substituting 
the estimates from the printout into this formula 
yields the expression obtained previously, namely 
the exponential of 2.355 minus 0.342 times log 
WBC. 


VI. Adjusted Survival Curves 
Using the Cox PH Model 

Two primary quantities: The two primary quantities desired from a sur¬ 

vival analysis point of view are estimated hazard 

1. estimated hazard ratios ratios and estimated survival curves. Having just 

2. estimated survival curves described how to compute hazard ratios, we now 

turn to estimation of survival curves using the Cox 
model. 
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No model: use KM curves 



Recall that if no model is used to fit survival data, 
a survival curve can be estimated using a Kaplan- 
Meier method. Such KM curves are plotted as step 
functions as shown here for the remission data 
example. 


Cox model: adjusted survival 
curves (also step functions). 


Cox model hazard function: 


E 

h(t,X) = /7 0 (T) e !=1 

Cox model survival function: 

P 

E 

S(f,x ) = [S 0 (t)] e '" 


When a Cox model is used to fit survival data, sur¬ 
vival curves can be obtained that adjust for the 
explanatory variables used as predictors. These 
are called adjusted survival curves, and, like KM 
curves, these are also plotted as step functions. 

The hazard function formula for the Cox PH 
model, shown here again, can be converted to a 
corresponding survival function formula as shown 
below. This survival function formula is the ba¬ 
sis for determining adjusted survival curves. Note 
that this formula says that the survival function at 
time t for a subject with vector X as predictors is 
given by a baseline survival function So(t) raised 
to a power equal to the exponential of the sum of 
(3,- times X,. 


Estimated survival function: 

P „ 

E ft*,- 

S(t,x) = [s 0 (f)] e=1 


The expression for the estimated survival function 
can then be written with the usual “hat” notation 
as shown here. 


So(t) and (3, are provided by the The estimates of Sq(i ) and (3,- are provided by the 
computer program. The X t must computer program that fits the Cox model. The 
be specified by the investigator. X's, however, must first be specified by the investi¬ 
gator before the computer program can compute 
the estimated survival curve. 
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For example, if we consider model 2 for the 
remission data, the fitted model written in terms 
of both the hazard function and corresponding 
survival function is given here. 

We can obtain a specific survival curve by 
specifying values for the vector X, whose compo¬ 
nent variables are Rx and log WBC. 

For instance, if Rx = 1 and log WBC = 2.93, 
the estimated survival curve is obtained by 
substituting these values in the formula as shown 
here, and carrying out the algebra to obtain the 
expression circled. Note that the value 2.93 is the 
overall mean log WBC for the entire dataset of 42 
subjects. 

Also, if Rx = 0 and log WBC = 2.93, the es¬ 
timated survival curve is obtained as shown here. 

Each of the circled expressions gives ad¬ 
justed survival curves, where the adjustment is 
for the values specified for the X’s. Note that for 
each expression, a survival probability can be 
obtained for any value of t. 

The two formulae just obtained, again shown 
here, allow us to compare survival curves for dif¬ 
ferent treatment groups adjusted for the covariate 
log WBC. Both curves describe estimated survival 
probabilities over time assuming the same value 
of log WBC, in this case, the value 2.93. 


Typically, use X = X or X met j ian . Typically, when computing adjusted survival 

curves, the value chosen for a covariate being ad- 
Computer uses X. justed is an average value like an arithmetic mean 

or a median. In fact, most computer programs for 
the Cox model automatically use the mean value 
over all subjects for each covariate being adjusted. 

In our example, the mean log WBC for all 42 sub¬ 
jects in the remission data set is 2.93. That is why 
we chose this value for log WBC in the formulae 
for the adjusted survival curve. 


EXAMPLE (continued) 


Remission data (n = 42): 


log WBC = 2.93 
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General formulae for adjusted 
survival curves comparing two 
groups: 

Exposed subjects: 


S(t,X i) = [S 0 (O] 


exp[j3i(l)+£ PiX,] 
Ml 


Unexposed subjects: 


S(t,X o) = [$, (0] 


exp[Pi(0)+^ PiX,] 
Ml 


General formula for adjusted 
survival curve for all 
covariates in the model: 

S(t,X) = [S 0 (t)] exp[E m] 


More generally, if we want to compare survival 
curves for two levels of an exposure variable, and 
we want to adjust for several covariates, we can 
write the formula for each curve as shown here. 
Note that we are assuming that the exposure vari¬ 
able is variable X\, whose estimated coefficient is 
(3 1 “hat,” and the value of X\ is 1 for exposed and 
0 for unexposed subjects. 


Also, if we want to obtain an adjusted survival 
curve which adjusts for all covariates in the model, 
the general formula which uses the mean value 
for each covariate is given as shown here. This 
formula will give a single adjusted survival curve 
rather than different curves for each exposure 
group. 


EXAMPLE (continued) 


Single survival curve for Cox model 
containing Rx and log WBC: 

Rx = 0.50 

log WBC = 2.93 

S(t,X) = [S 0 (f)T x P ( Pl^ + fologWBC) 

= [S 0 (f)]exp(1.294(0.5)+ 1.604(2.93)) 

= [S 0 (f)]“P(5.35) = (jS 0 (f)]210.6^) 


To illustrate this formula, suppose we again con¬ 
sider the remission data, and we wish to obtain a 
single survival curve that adjusts for both Rx and 
log WBC in the fitted Cox model containing these 
two variables. Using the mean value of each co¬ 
variate, we find that the mean value for Rx is 0.5 
and the mean value for log WBC is 2.93, as before. 

To obtain the single survival curve that adjusts for 
Rx and log WBC, we then substitute the mean val¬ 
ues in the formula for the adjusted survival curve 
for the model fitted. The formula and the result¬ 
ing expression for the adjusted survival curve are 
shown here. (Note that for the remission data, 
where it is of interest to compare two exposure 
groups, the use of a single survival curve is not 
appropriate.) 


Compute survival probability by From this expression for the survival curve, a sur- 

specifying value for t in vival probability can be computed for any value 

S(tX) = [S 0 (f)] 210 - 6 of t that is specified. When graphing this survival 

curve using a computer package, the values of t 
Computer uses t’s which are that are chosen are the failure times of all persons 

failure times. in the study who got the event. This process is au¬ 

tomatically carried out by the computer without 
having the user specify each failure time. 
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EXAMPLE 


Adjusted Survival Curves for Treatment and 
Placebo Groups 

S(t) 



The graph of adjusted survival curves obtained 
from fitting a Cox model is usually plotted as a 
step function. For example, we show here the step 
functions for the two adjusted survival curves ob¬ 
tained by specifying either 1 or 0 for treatment 
status and letting log WBC be the mean value 2.93. 


Next section: PH assumption 

• explain meaning 

• when PH not satisfied 


We now turn to the concept of the proportional 
hazard (PH) assumption. In the next section, we 
explain the meaning of this assumption and we 
give an example of when this assumption is not 
satisfied. 


Later presentations: 

• how to evaluate PH 

• analysis when PH not met 


In later presentations, we expand on this subject, 
describing how to evaluate statistically whether 
the assumption is met and how to carry out the 
analysis when the assumption is not met. 


VII. The Meaning of the 
PH Assumption 

PH: HR is constant over time, i.e., 
h(t,X*) = constant x fi(t,X) 


The PH assumption requires that the HR is con¬ 
stant over time, or equivalently, that the hazard 
for one individual is proportional to the hazard 
for any other individual, where the proportional¬ 
ity constant is independent of time. 


— h(t,X*) 

HR = . v 

Ht.X) 


/*o(Oexp 

E hxt] 

/z 0 (Oexp 

E Mi] 


= exp 




where X* = (X* V X*,..., X*) and 
X = (X U X 2 ,...,X P ) 


denote the set of X s for two 
individuals. 


To understand the PH assumption, we need to re¬ 
consider the formula for the HR that compares 
two different specifications X* and X for the ex¬ 
planatory variables used in the Cox model. We de¬ 
rived this formula previously in Section V, and we 
show this derivation again here. Notice that the 
baseline hazard function h 0 (t) appears in both the 
numerator and denominator of the hazard ratio 
and cancels out of the formula. 
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h(t,X*) 

h(t,X) 


exp 




;=i 


does not involve t. 


The final expression for the hazard ratio therefore 
involves the estimated coefficients (3, “hat” and 
the values of X* and X for each variable. However, 
because the baseline hazard has canceled out, the 
final expression does not involve time t. 


0 = exp 
then 


Constant 




i=i 


h(t,X*) 

h(t,X) 


Thus, once the model is fitted and the values for 
X* and X are specified, the value of the exponen¬ 
tial expression for the estimated hazard ratio is a 
constant, which does not depend on time. If we 
denote this constant by 0 “hat,” then we can write 
the hazard ratio as shown here. This is a mathe¬ 
matical expression which states the proportional 
hazards assumption. 


HR (X* versus X) 


Graphically, this expression says that the esti¬ 
mated hazard ratio comparing any two individ¬ 
uals plots as a constant over time. 


h(t, X *) = Qh(t, X ) 



Proportionality constant 
(not dependent on time) 


Another way to write the proportional hazards 
assumption mathematically expresses the hazard 
function for individual X* as 0 “hat” times the 
hazard function for individual X, as shown here. 
This expression says that the hazard function for 
one individual is proportional to the hazard func¬ 
tion for another individual, where the proportion¬ 
ality constant is 0 “hat,” which does not depend 
on time. 



To illustrate the proportional hazard assumption, 
we again consider the Cox model for the remission 
data involving the two variables Rx and log WBC. 
For this model, the estimated hazard ratio that 
compares placebo (Rx = 1) with treated (Rx = 0) 
subjects controlling for log WBC is given by e to 
the 1.294, which is 3.65, a constant. 

Thus, the hazard for placebo group (Rx = 1) is 
3.65 times the hazard for the treatment group 
(Rx = 0), and the value, 3.65, is the same re¬ 
gardless of time. In other words, using the above 
model, the hazard for the placebo group is propor¬ 
tional to the hazard for the treatment group, and 
the proportionality constant is 3.65. 
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To further illustrate the concept of proportional 
hazards, we now provide an example of a situation 
for which the proportional hazards assumption is 
not satisfied. 

For our example, we consider a study in which 
cancer patients are randomized to either surgery 
or radiation therapy without surgery. Thus, we 
have a (0,1) exposure variable denoting surgery 
status, with 0 if a patient receives surgery and 1 
if not. Suppose further that this exposure variable 
is the only variable of interest, so that a Cox PH 
model for the analysis of this data, as shown here, 
will contain only the one variable E, denoting 
exposure. 

Now the question we consider here is whether the 
above Cox model containing the variable E is an 
appropriate model to use for this situation. To an¬ 
swer this question we note that when a patient 
undergoes serious surgery, as when removing a 
cancerous tumor, there is usually a high risk for 
complications from surgery or perhaps even death 
early in the recovery process, and once the patient 
gets past this early critical period, the benefits of 
surgery, if any, can then be observed. 

Thus, in a study that compares surgery to no 
surgery, we might expect to see hazard functions 
for each group that appear as shown here. No¬ 
tice that these two functions cross at about three 
days, and that prior to three days the hazard 
for the surgery group is higher than the hazard 
for the no surgery group, whereas after three days, 
the hazard for the surgery group is lower than the 
hazard for the no surgery group. 

Looking at the above graph more closely, we can 
see that at 2 days, when t = 2, the hazard ratio of 
non-surgery (E = 1) to surgery (E = 0) patients 
yields a value less than 1. In contrast, at t = 5 days, 
the hazard ratio of nonsurgery to surgery yields a 
value greater than 1. 












110 3. The Cox Proportional Hazards Model and Its Characteristics 


EXAMPLE: (continued) 


Given the above description, HR is not 

constant over time. 


Cox PH model inappropriate because 
PH model assumes constant HR: 


h(t,X) = h 0 (t)e$ E 


HR 


h(t,E= 1) 
Ut, E = 0) 


Thus, if the above description of the hazard func¬ 
tions for each group is accurate, the hazard ratios 
are not constant over time. That is, the hazard ra¬ 
tio is some number less than 1 before three days 
and greater than 1 after three days. 

It is therefore inappropriate to use a Cox PH model 
for this situation, because the PH model assumes a 
constant hazard ratio across time, whereas our sit¬ 
uation yields a hazard ratio that varies with time. 

In fact, if we use a Cox PH model, shown here 
again, the estimated hazard ratio comparing ex¬ 
posed to unexposed patients at any time is given 
by the constant value e to the (3 “hat,” which does 
not vary over time. 


General rule: 

If the hazards cross, then a Cox 
PH model is not appropriate. 


This example illustrates the general rule that if the 
hazards cross, then the PH assumption cannot be 
met, so that a Cox PH model is inappropriate. 


Analysis when Cox PH model not It is natural to ask at this point, if the Cox PH 
appropriate? See Chapters 5 and 6. model is inappropriate, how should we carry out 

the analysis? The answer to this question is dis¬ 
cussed in Chapters 5 and 6. However, we will give 
a brief reply with regard to the surgery study ex¬ 
ample just described. 


EXAMPLE (continued) 


Surgery study analysis options: 

• stratify by exposure (use KM curves) 

• start analysis at three days; use Cox 
PH model 

• fit PH model for < 3 days and for > 3 
days; get HR (< 3 days) and HR 

(> 3 days) 

• include time-dependent variable 
(e.g., E x f); use extended Cox model 


Actually, for the surgery study there are several 

options available for the analysis. These include: 

• analyze by stratifying on the exposure vari¬ 
able; that is, do not fit any model, and, instead 
obtain Kaplan-Meier curves for each exposure 
group separately; 

• start the analysis at three days, and use a Cox 
PH model on three-day survivors; 

• fit Cox model for less than three days and a 
different Cox model for greater than three days 
to get two different hazard ratio estimates, one 
for each of these two time periods; 

• fit a modified Cox model that includes a time- 
dependent variable which measures the inter¬ 
action of exposure with time. This model is 
called an extended Cox model. 
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Different options may lead to 
different conclusions. 


Hazards 

cross 

but 

') 


PH not met 


PH met 


See Chapter 4: Evaluating PH 
Assumption 


Further discussion of these options is given in sub¬ 
sequent chapters. We point out here, that differ¬ 
ent options may lead to different conclusions, so 
that the investigator may have to weigh the relative 
merits of each option in light of the data actually 
obtained before deciding on any particular option 
as best. 

One final comment before concluding this section: 
although we have shown that when the hazards 
cross, the PH assumption is not met, we have not 
shown how to decide when the PH assumption 
is met. This is the subject of Chapter 4 entitled, 
“Evaluating the PH Assumption.” 


VI11. The Cox Likelihood 

Likelihood 

• Typically based on outcome 
distribution 

• Outcome distribution not 
specified for Cox model 

• Cox likelihood based on order 
of events rather than their 
distribution 

o Called partial likelihood 


Typically, the formulation of a likelihood function 
is based on the distribution of the outcome. How¬ 
ever, one of the key features of the Cox model is 
that there is not an assumed distribution for the 
outcome variable (i.e., the time to event). There¬ 
fore, in contrast to a parametric model, a full like¬ 
lihood based on the outcome distribution cannot 
be formulated for the Cox PH model. Instead, the 
construction of the Cox likelihood is based on 
the observed order of events rather than the 
joint distribution of events. Thus the Cox likeli¬ 
hood is called a “partial” likelihood. 


Illustration 

Scenario: 

• Gary, Larry, Barry have lottery 
tickets 

• Winning tickets chosen at times 
ti, t2, ... 

• Each person ultimately chosen 

• Can be chosen only once 


To illustrate the idea underlying the formulation 
of the Cox model, consider the following scenario. 
Suppose Gary, Larry, and Barry are each given a 
lottery ticket. Winning tickets are chosen at times 
tj (j = 1,2,...). Assume each person is ultimately 
chosen and once a person is chosen he cannot be 
chosen again (i.e., he is out of the risk set). What is 
the probability that the order each person is cho¬ 
sen is first Barry, then Gary, and finally Larry? 


Question: 

What is the probability that the 
order chosen is as follows? 


1. Barry 

2. Gary 

3. Larry 
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Answer: 

1 1 11 

Probability = -x-x- = - 
J 3 2 1 6 

/ t \ 

Barry Gary Larry 


The probability the Barry's ticket is chosen before 
Gary's and Larry's is one out of three. Once Barry's 
ticket is chosen it cannot be chosen again. The 
probability that Gary’s ticket is then chosen before 
Larry's is one out of two. Once Barry’s and Gary's 
tickets are chosen they cannot be chosen again 
which means that Larry’s ticket must be chosen 
last. This yields a probability of 1/6 for this given 
order of events (see left). 


Scenario: 

Barry - 4 tickets 
Gary - 1 ticket 
Larry - 2 tickets 


Now consider a modification of the previous sce¬ 
nario. Suppose Barry has 4 tickets, Gary has 
1 ticket, and Larry has 2 tickets; now what is the 
probability that the order each person is chosen is 
first Barry, then Gary, and finally Larry? 


Question: 

What is the probability that the 
order chosen is as follows? 

1. Barry 

2. Gary 

3. Larry 


Barry, Gary, and Larry have 7 tickets in all and 
Barry owns 4 of them so Barry’s probability of be¬ 
ing chosen first is 4 out of 7. After Barry is chosen, 
Gary has 1 of the 3 remaining tickets and after 
Barry and Gary are chosen, Larry owns the re¬ 
maining 2 tickets. This yields a probability of 4/21 
for this order (see left). 


Answer: 

412 4 

Probability = -x-x- = — 


For this scenario For this scenario, the probability of a particular 

order is affected by the number of tickets held by 
Subject's number of tickets each subject. Fora Cox model, the likelihood of the 

affects probability observed order of events is affected by the pattern 

of covariates of each subject. 

For Cox model 

Subject's pattern of covariates 
affects likelihood of ordered 
events 
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ID 

TIME 

STATUS 

SMOKE 

Barry 

2 

1 

1 

Gary 

3 

1 

0 

Harry 

5 

0 

0 

Larry 

8 

1 

1 


SURVT = Survival time (in years) 
STATUS = 1 for event, 0 for 
censorship 

SMOKE = 1 for a smoker, 0 for a 
nonsmoker 

Cox PH model 
hit) = ho(t)eP' SMOKE 


ID 

Hazard 

Barry 

hoiOe?' 

Gary 

O 

O 

Harry 

O 

O 

Larry 

hoi^e? 1 


Individual hazards (Cox likelihood) 
analogous to number of tickets 
(lottery scenario) For example, 
smokers analogous to persons with 
extra lottery tickets 

Cox Likelihood 

, _ r__' 

_h Q (t)eh +h 0 (t)e° + ho(t)e° + ho(t)ePi _ 
ho(t)e° 

h Q (t)e° + h Q (t)e° + h Q (t)eP> 

f h 0 (t)efo ' 

X _h 0 (t)eh _ 

Likelihood is product of 3 terms 

L = L i x L 2 x L 3 

, _ 1“_ h 0 (t)eP' _' 

1 ho(t)eh +h 0 (t)e° +ho(t)e° + h 0 (t)e^_ 

^ _ _ h 0 (t)e° _ 

ho(t)e° + ho(t)e° + ha(t)e^ 

T I" h 0 (t)e p '' 

3 ” [h 0 (t)e^ _ 


To illustrate this connection, consider the dataset 
shown on the left. The data indicate that Barry got 
the event at TIME = 2 years. Gary got the event at 
3 years, Harry was censored at 5 years, and Larry 
got the event at 8 years. Furthermore, Barry and 
Larry were smokers whereas Gary and Harry were 
nonsmokers. 


Consider the Cox proportional hazards model 
with one predictor, SMOKE. Under this model the 
hazards for Barry, Gary, Harry, and Larry can be 
expressed as shown on the left. The individual haz¬ 
ards are determined by whether the subject was a 
smoker or nonsmoker. 

The individual level hazards play an analogous 
role toward the construction of the Cox likeli¬ 
hood as the number of tickets held by each subject 
plays for the calculation of the probabilities in the 
lottery scenario discussed earlier in this section. 
The subjects who smoke are analogous to persons 
given extra lottery tickets, thereby affecting the 
probability of a particular order of events. 

On the left is the Cox likelihood for these data. 
Notice the likelihood is a product of three terms, 
which correspond to the three event times. Barry 
got the event first at TIME = 2 years. At that time 
all four subjects were at risk for the event. The 
first product (L|) has the sum of the four subjects' 
hazards in the denominator and Barry's hazard in 
the numerator. Gary got the event next at 3 years 
when Gary, Harry, and Larry were still in the 
risk set. Consequently, the second product (L 2 ) has 
the sum of the three hazards for the subjects still 
at risk in the denominator and Gary’s hazard in 
the numerator. Harry was censored at 5 years, 
which occurred between the second and third 
event. Therefore, when Larry got the final event 
at 8 years, nobody else was at risk for the event. 
As a result, the third product (L 3 ) just has Larry's 
hazard in the denominator and the numerator. 
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ti, time = 2, four at risk (Li) 
t 2 , time = 3, three at risk (L 2 ) 
t 3 , time = 8, one at risk (L 3 ) 

For each term: 

Numerator—single hazard 
Denominator—sum of hazards 

Baseline hazard, ho(t) cancels 


eA 



A + e° + e° + eA 


1 

O 

_1 


1 

1 

1 

+ 

o 

+ 

o 


1 


Thus, L does not depend on h 0 (t) 


Data A 


ID 

TIME 

STATUS 

SMOKE 

Barry 

2 

1 

1 

Gary 

3 

1 

0 

Harry 

5 

0 

0 

Larry 

8 

1 

1 

Data B 

ID 

TIME 

STATUS 

SMOKE 

Barry 

1 

1 

1 

Gary 

7 

1 

0 

Harry 

8 

0 

0 

Larry 

63 

1 

1 


Comparing datasets 

• TIME variable differs 

• Order of events the same 

• Cox PH likelihood the same 


To summarize, the likelihood in our example con¬ 
sists of a product of three terms (Li, L 2 , and L 3 ) 
corresponding to the ordered failure times (ti, t 2l 
and t 3 ). The denominator for the term correspond¬ 
ing to time tj (j = 1,2,3) is the sum of the hazards 
for those subjects still at risk at time tj, and the 
numerator is the hazard for the subject who got 
the event at tj. 

A key property of the Cox likelihood is that the 
baseline hazard cancels out in each term. Thus, 
the form of the baseline hazard need not be spec¬ 
ified in a Cox model, as it plays no role in the esti¬ 
mation of the regression parameters. By factoring 
ho(t) in the denominator and then canceling it out 
of each term, the likelihood for Barry, Gary, and 
Larry can be rewritten as shown on the left. 

As we mentioned earlier, the Cox likelihood is de¬ 
termined by the order of events and censorships 
and not by the distribution of the outcome vari¬ 
able. To illustrate this point, compare datasets 
A and B on the left, and consider the likelihood 
for a Cox PH model with smoking status as the 
only predictor. Although the values for the variable 
TIME differ in the two datasets, the Cox likelihood 
will be the same using either dataset because the 
order of the outcome (TIME) remains unchanged. 
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General Approach 

• K failure times 

• Likelihood a product of K 
terms 

• Construction of each term 
similar to Barry, Gary, and 
Larry 

L = L\ x L2 x L3 x • • • x Lk 

k 

= Y\ L i 

7=1 

Obtaining maximum likelihood Once the likelihood is formulated, the question be¬ 
comes: which values of the regression parameters 
would maximize L ? The process of maximizing the 
likelihood is typically carried out by setting the 
partial derivative of the natural log of L to zero 
and then solving the system of equations (called 
__ y l _ i z ^ p the score equations). 

9 A- 

p = # of parameters 


estimates 

Solve system of equations 
9lnL „ . , „ „ 


We have used a small dataset (four observations 
with three failure times) for ease of illustration. 
However, the approach can be generalized. Con¬ 
sider a dataset with k failure times and let L, 
denote the contribution to the likelihood corre¬ 
sponding to the jth failure time. Then the Cox like¬ 
lihood can be formulated as a product of each of 
the k terms as shown on the left. Each of the terms 
Lj is constructed in a similar manner as with the 
data for Gary, Larry, and Barry. 


IX. Summary 

1. Review: S(t), h(t), data layout, 
etc. 

2. Computer example of Cox model: 

• estimate HR 

• test hypothesis about HR 

• obtain confidence intervals 

3. Cox model formula: 

E p iXt 

h(f,X) = h 0 (t)e& 


4. Why popular: Cox PH model is 
“robust” 


In this section we briefly summarize the content 
covered in this presentation. 


• We began with a computer example that uses 
the Cox PH model. We showed how to use 
the output to estimate the HR, and how to 
test hypotheses and obtain confidence inter¬ 
vals about the hazard ratio. 

• We then provided the formula for the hazard 
function for the Cox PH model and described 
basic features of this model. The most impor¬ 
tant feature is that the model contains two 
components, namely, a baseline hazard func¬ 
tion of time and an exponential function in¬ 
volving X’s but not time. 

• We discussed reasons why the Cox model is 
popular, the primary reason being that the 
model is “robust” for many different survival 
analysis situations. 
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5. ML estimation: maximize a 
partial likelihood 
L = joint probability of 
observed data = L(|3) 


6. Hazard ratio formula: 


HR = exp 


I] IW - X,) 


;=i 


7. Adjusted survival curves: 0 or 1 
Comparing E groups: / 

exp[Pi£+£ j3, Xj] 

S(f,x) = [5b(0] 


Ml 


Single curve: 

S(t,X) = [So(t)] exp[E 


• We then discussed ML estimation of the pa¬ 
rameters in the Cox model, and pointed out 
that the ML procedure maximizes a “partial” 
likelihood that focuses on probabilities at fail¬ 
ure times only. 

• Next, we gave a general formula for estimat¬ 
ing a hazard ratio that compared two speci¬ 
fications of the X’s, defined as X* and X. We 
illustrated the use of this formula when com¬ 
paring two exposure groups adjusted for other 
variables. 

• We then defined an adjusted survival curve 
and presented formulas for adjusted curves 
comparing two groups adjusted for other vari¬ 
ables in the model and a formula for a single 
adjusted curve that adjusts for all X’s in the 
model. Computer packages for these formulae 
use the mean value of each X being adjusted 
in the computation of the adjusted curve. 


8. PH assumption: • 

h(t,X*) A , 

—r-= 0 (a constant over t) 

h(t,X) 

i.e.,h(t, X*) = dh(t,X) 

Hazards cross => PH not met 

9. Cox PH likelihood s 


We described the PH assumption as meaning 
that the hazard ratio is constant over time, or 
equivalently that the hazard for one individual 
is proportional to the hazard for any other in¬ 
dividual, where the proportionality constant is 
independent of time. We also showed that for 
study situations in which the hazards cross, 
the PH assumption is not met. 

Finally, we describe how the Cox likelihood is 
developed using the ordered failure times from 
the data. 


Chapters 


1. Introduction to Survival 
Analysis 

2. Kaplan-Meier Survival Curves 

and the Log-Rank Test _ 


3. 


The Cox Proportional Hazards 
v Model and Its Characteristics y 


4. Evaluating the Proportional 
Hazards Assumption 


5. The Stratified Cox Procedure 

6. Extension of the Cox 
Proportional Hazards Model for 
Time-Dependent Variables 


This presentation is now complete. We recom¬ 
mend that the reader review the detailed outline 
that follows and then do the practice exercises and 
test. 

The next Chapter (4) describes how to evaluate the 
PH assumption. Chapters 5 and 6 describe meth¬ 
ods for carrying out the analysis when the PH as¬ 
sumption is not met. 
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Detailed 

Outline 


I. A computer example using the Cox PH model 

(pages 86-94) 

A. Printout shown for three models involving 
leukemia remission data. 

B. Three explanatory variables of interest: treatment 
status, log WBC, and product term; outcome is 
time until subject goes out of remission. 

C. Discussion of how to evaluate which model is best. 

D. Similarity to classical regression and logistic 
regression. 

II. The formula for the Cox PH model (pages 94-96) 



r p 

h(t, X) = ho(t)ex p 

E1 

j=i J 


B. h 0 (t) is called the baseline hazard function. 

C. X denotes a collection of p explanatory variables 
X U X 2 ,...,X P . 

D. The model is semiparametric because h 0 (t) is 
unspecified. 

E. Examples of the Cox model using the leukemia 
remission data. 

F. Survival curves can be derived from the Cox PH 
model. 

III. Why the Cox PH model is popular (pages 96-98) 

A. Can get an estimate of effect (the hazard ratio) 
without needing to know ho(t). 

B. Can estimate h 0 (t ), h(t .X), and survivor functions, 
even though ho(t) is not specified. 

C. The e part of the formula is used to ensure that the 
fitted hazard is nonnegative. 

D. The Cox model is “robust”: it usually fits the data 
well no matter which parametric model is 
appropriate. 

IV. ML estimation of the Cox PH model (pages 98-100) 

A. Likelihood function is maximized. 

B. L is called a partial likelihood, because it uses 
survival time information only on failures, and 
does not use censored information explicitly. 

C. L makes use of the risk set at each time that a 
subject fails. 

D. Inferences are made using standard large sample 
ML techniques, e.g., Wald or likelihood ratio tests 
and large sample confidence intervals based on 
asymptotic normality assumptions 
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V. Computing the hazard ratio (pages 100-103) 

A. Formula for hazard ratio comparing two 
individuals, X* = (X*, X \,..., X*) and 
X = (X U X 2 ,...,X P ): 



B. Examples are given using a (0,1) exposure variable, 
potential confounders, and potential effect 
modifiers. 

C. Typical coding identifies X* as the group with the 
larger hazard and X as the group with the smaller 
hazard, e.g., X* = 1 for unexposed group and 

X\ = 0 for exposed group. 

VI. Adjusted survival curves using the Cox PH model 

(pages 103-107) 

A. Survival curve formula can be obtained from 
hazard ratio formula: 

S(f,X) = [S 0 (0] exp[EPiX ' ] 

where So(f) is the baseline survival function that 
corresponds to the baseline hazard function ho(t). 

B. To graph S(t,X), must specify values for 
X = (X U X 2 ,...,X P ). 

C. To obtain “adjusted” survival curves, usually use 
overall mean values for the X’s being adjusted. 

D. Examples of “adjusted” S(f, X) using leukemia 
remission data. 

VII. The meaning of the PH assumption (pages 107-111) 

A. Hazard ratio formula shows that hazard ratio is 
independent of time: 

h(t, X*) _ 
h(t,X) 

B. Baseline hazard function not involved in the HR 
formula. 

C. Hazard ratio for two X’s are proportional: 
h(t,X*) = Qh{t, X) 

D. An example when the PH assumption is not 
satisfied: hazards cross 

VIII. Cox likelihood (pages 111-115) 

A. Lottery Example 

B. Likelihood based on order of events 
IX. Summary (pages 115-116) 
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Practice 

Exercises 


1. In a 10-year follow-up study conducted in Evans County, 
Georgia, involving persons 60 years or older, one research 
question concerned evaluating the relationship of social sup¬ 
port to mortality status. A Cox proportional hazards model 
was fit to describe the relationship of a measure of social 
network to time until death. The social network index was 
denoted as SNI, and took on integer values between 0 (poor 
social network) to 5 (excellent social network). Variables to 
be considered for control in the analysis as either potential 
confounders or potential effect modifiers were AGE (treated 
continuously), RACE (0,1), and SEX (0,1). 

a. State an initial PH model that can be used to assess the 
relationship of interest, which considers the potential 
confounding and interaction effects of the AGE, RACE, 
and SEX (assume no higher than two-factor products 
involving SNI with AGE, RACE, and SEX). 

b. For your model in part la, give an expression for the 
hazard ratio that compares a person with SNI = 4 to a 
person with SNI = 2 and the same values of the 
covariates being controlled. 

c. Describe how you would test for interaction using your 
model in part la. In particular, state the null 
hypothesis, the general form of your test statistic, with 
its distribution and degrees of freedom under the null 
hypothesis. 

d. Assuming a revised model containing no interaction 
terms, give an expression for a 95% interval estimate 
for the adjusted hazard ratio comparing a person with 
SNI = 4 to a person with SNI = 2 and the same values 
of the covariates in your model. 

e. For the no-interaction model described in part Id, give 
an expression (i.e., formula) for the estimated survival 
curve for a person with SNI = 4, adjusted for AGE, 
RACE, and SEX, where the adjustment uses the overall 
mean value for each of the three covariates. 

f. Using the no-interaction model described in part Id, if 
the estimated survival curves for persons with SNI = 4 
and SNI = 2 adjusted for (mean) AGE, RACE, and SEX 
are plotted over time, will these two estimated survival 
curves cross? Explain briefly. 
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2. For this question, we consider the survival data for 137 pa¬ 
tients from the Veteran’s Administration Lung Cancer Trial 
cited by Kalbfleisch and Prentice in their book (The Statis¬ 
tical Analysis of Survival Time Data, Wiley, 1980). The vari¬ 
ables in this dataset are listed as follows: 


Variable# Variable name Coding 



1 

Treatment 

Standard = 1, test = 2 

Four 

2 

Cell type 1 

Large = 1, other = 0 

indicator 

3 

Cell type 2 

Adeno = 1, other = 0 

variables 

4 

Cell type 3 

Small = 1, other = 0 

for cell type 

5 

Cell type 4 

Squamous = 1, other = 0 


6 

Survival time 

(Days) integer counts 


7 

Performance 

status 

0 = worst,..., 100 = best 


8 

Disease 

duration 

(Months) integer counts 


9 

Age 

(Years) integer counts 


10 

Prior therapy 

None = 0, some =10 


11 

Status 

0 = censored, 1 = died 


For these data, a Cox PH model was fitted yielding the fol¬ 
lowing edited computer results: 

Response: survival time 


Variable 


name 

Coef. 

Std. Err. 

P > |z| 

Haz. Ratio 

[95% Conf. interval] 

1 Treatment 

0.290 

0.207 

0.162 

1.336 

0.890 

2.006 

3 Adeno cell 

0.789 

0.303 

0.009 

2.200 

1.216 

3.982 

4 Small cell 

0.457 

0.266 

0.086 

1.579 

0.937 

2.661 

5 Squamous cell 

-0.400 

0.283 

0.157 

0.671 

0.385 

1.167 

7 Perf. status 

-0.033 

0.006 

0.000 

0.968 

0.958 

0.978 

8 Disease dur. 

0.000 

0.009 

0.992 

1.000 

0.982 

1.018 

9 Age 

-0.009 

0.009 

0.358 

0.991 

0.974 

1.010 

10 Prior therapy 

0.007 

0.023 

0.755 

1.007 

0.962 

1.054 


Log likelihood = —475.180 


a. State the Cox PH model used to obtain the above 
computer results. 

b. Using the printout above, what is the hazard ratio that 
compares persons with adeno cell type with persons 
with large cell type? Explain your answer using the 
general hazard ratio formula for the Cox PH model. 
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c. Using the printout above, what is the hazard ratio that 
compares persons with adeno cell type with persons 
with squamous cell type? Explain your answer using 
the general hazard ratio formula for the Cox PH 
model. 

d. Based on the computer results, is there an effect of 
treatment on survival time? Explain briefly. 

e. Give an expression for the estimated survival curve for 
a person who was given the test treatment and who 
had a squamous cell type, where the variables to be 
adjusted are performance status, disease duration, 
age, and prior therapy. 

f. Suppose a revised Cox model is used which contains, 
in addition to the variables already included, the 
product terms: treatment x performance status; 
treatment x disease duration; treatment x age; and 
treatment x prior therapy. For this revised model, give 
an expression for the hazard ratio for the effect of 
treatment, adjusted for the other variables in the 
model. 

3. The data for this question contain survival times of 65 
multiple myeloma patients (references: SPIDA manual, 
Sydney, Australia, 1991; and Krall et al., “A Step-up 
Procedure for Selecting Variables Associated with Survival 
Data,” Biometrics, vol. 31, pp. 49-57, 1975). A partial list of 
the variables in the dataset is given below: 


Variable 1: 
Variable 2: 

Variable 3: 
Variable 4: 

Variable 5: 
Variable 6: 


observation number 

survival time (in months) from time of 

diagnosis 

survival status (0 = alive, 1 = dead) 
platelets at diagnosis (0 = abnormal, 

1 = normal) 

age at diagnosis (years) 

sex (1 = male, 2 = female) 
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Below, we provide edited computer results for several 
different Cox models that were lit to this dataset. A number 
of questions will be asked about these results. 

Model 1: 


Variable 

Coef. 

Std. Err. 

p > |z| 

Haz. Ratio 

[95% Conf. Interval] 

Platelets 

0.470 

2.854 

.869 

1.600 

0.006 

429.689 

Age 

0.000 

0.037 

.998 

1.000 

0.930 

1.075 

Sex 

0.183 

0.725 

.801 

1.200 

0.290 

4.969 

Platelets x age 

-0.008 

0.041 

.850 

0.992 

0.915 

1.075 

Platelets x sex 

-0.503 

0.804 

.532 

0.605 

0.125 

2.924 


Log likelihood = - 

-153.040 




Model 2: 

Platelets 

-0.725 

0.401 

.071 

0.484 

0.221 

1.063 

Age 

-0.005 

0.016 

.740 

0.995 

0.965 

1.026 

Sex 

-0.221 

0.311 

.478 

0.802 

0.436 

1.476 


Log likelihood = - 

-153.253 




Model 3: 

Platelets 

-0.706 

0.401 

.078 

0.493 

0.225 

1.083 

Age 

-0.003 

0.015 

.828 

0.997 

0.967 

1.027 


Log likelihood = - 

-153.509 




Model 4: 

Platelets 

-0.705 

0.397 

.076 

0.494 

0.227 

1.075 

Sex 

-0.204 

0.307 

.506 

0.815 

0.447 

1.489 


Log likelihood = - 

-153.308 




Model 5: 

Platelets 

-0.694 

0.397 

.080 

0.500 

0.230 

1.088 


Log likelihood = —153.533 

a. For model 1, give an expression for the hazard ratio 
for the effect of the platelet variable adjusted for age 
and sex. 

b. Using your answer to part 3a, compute the estimated 
hazard ratio for a 40-year-old male. Also compute the 
estimated hazard ratio for a 50-year-old female. 

c. Carry out an appropriate test of hypothesis to evaluate 
whether there is any significant interaction in model 1. 
What is your conclusion? 

d. Considering models 2-5, evaluate whether age and sex 
need to be controlled as confounders? 

e. Which of the five models do you think is the best 
model and why? 

f. Based on your answer to part 3e, summarize the 
results that describe the effect of the platelet variable 
on survival adjusted for age and sex. 
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Test 1. Consider a hypothetical two-year study to investigate 

the effect of a passive smoking intervention program 
on the incidence of upper respiratory infection (URI) in 
newborn infants. The study design involves the random 
allocation of one of three intervention packages (A, B, C) 
to all healthy newborn infants in Orange County, North 
Carolina, during 1985. These infants are followed for two 
years to determine whether or not URI develops. The 
variables of interest for using a survival analysis on these 
data are: 

T = time (in weeks) until URI is detected or time until 
censored 

5 = censorship status (= 1 if URI is detected, = 0 if 
censored) 

PS = passive smoking index of family during the week of 
birth of the infant 

DC = daycare status (= 1 if outside daycare, = 0 if only 
daycare is in home) 

BF = breastfeeding status (= 1 if infant is breastfed, = 0 
if infant is not breastfed) 

T\ = first dummy variable for intervention status (= 1 if 
A, = 0 if B, = -1 if C) 

T 2 = second dummy variable for intervention status (= 1 
if B, = 0 if A, = — 1 if C). 

a. State the Cox PH model that would describe the 
relationship between intervention package and survival 
time, controlling for PS, DC, and BF as confounders 
and effect modifiers. In defining your model, use only 
two factor product terms involving exposure (i.e., 
intervention) variables multiplied by control variables 
in your model. 

b. Assuming that the Cox PH model is appropriate, give a 
formula for the hazard ratio that compares a person in 
intervention group A with a person in intervention 
group C, adjusting for PS, DC, and BF, and assuming 
interaction effects. 

c. Assuming that the PH model in part la is appropriate, 
describe how you would carry out a chunk test for 
interaction; i.e., state the null hypothesis, describe the 
test statistic and give the distribution of the test 
statistic and its degrees of freedom under the null 
hypothesis. 
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d. Assuming no interaction effects, how would you test 
whether packages A, B, and C are equally effective, 
after controlling for PS, DC, and BF in a Cox PH model 
without interaction terms (i.e., state the two models 
being compared, the null hypothesis, the test statistic, 
and the distribution of the test statistic under the null 
hypothesis). 

e. For the no-interaction model considered in parts lc 
and Id, give an expression for the estimated survival 
curves for the effect of intervention A adjusted for PS, 
DC, and BF. Also, give similar (but different) 
expressions for the adjusted survival curves for 
interventions B and C. 

2. The data for this question consists of a sample of 50 per¬ 
sons from the 1967-1980 Evans County Study. There are 
two basic independent variables of interest: AGE and 
chronic disease status (CHR), where CHR is coded as 
0 = none, 1 = chronic disease. A product term of the form 
AGE x CHR is also considered. The dependent variable 
is time until death, and the event is death. The primary 
question of interest concerns whether CHR, considered 
as the exposure variable, is related to survival time, con¬ 
trolling for AGE. The edited output of computer results 
for this question is given as follows: 

Model 1: 


Variable 

Coef. Std. Err. 

Chi-sq 

P > |z| 

CHR 

0.8595 0.3116 

7.61 

.0058 


Log likelihood = - 

142.87 


Model 2: 

CHR 

0.8051 0.3252 

6.13 

.0133 

AGE 

0.0856 0.0193 

19.63 

.0000 


Log likelihood = - 

-132.45 


Model 3: 

CHR 

1.0009 2.2556 

0.20 

.6572 

AGE 

0.0874 0.0276 

10.01 

.0016 

CHR x AGE 

-0.0030 0.0345 

0.01 

.9301 


Log likelihood = —132.35 






Test 125 


a. State the Cox PH model that allows for main effects of 
CHR and AGE as well as the interaction effect of CHR 
with AGE. 

b. Carry out the test for significant interaction; i.e., state 
the null hypothesis, the test statistic, and its 
distribution under the null hypothesis. What are your 
conclusions about interaction? 

c. Assuming no interaction, should AGE be controlled? 
Explain your answer on the basis of confounding 
and/or precision considerations. 

d. If, when considering plots of various hazard functions 
over time, the hazard function for persons with 
CHR = 1 crosses the hazard function for persons with 
CHR = 0, what does this indicate about the use of any 
of the three models provided in the printout? 

e. Using model 2, give an expression for the estimated 
survival curve for persons with CHR = 1, adjusted for 
AGE. Also, give an expression for the estimated survival 
curve for persons with CHR = 0, adjusted for AGE. 

f. What is your overall conclusion about the effect of 
CHR on survival time based on the computer results 
provided from this study? 

3. The data for this question contain remission times of 

42 multiple leukemia patients in a clinical trial of a new 

treatment. The variables in the dataset are given below: 


Variable 1: 
Variable 2: 
Variable 3: 
Variable 4: 
Variable 5: 


survival time (in weeks) 

status (1 = in remission, 0 = relapse) 

sex (1 = female, 0 = male) 

log WBC 

Rx status (1 = placebo, 0 = treatment) 
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Below, we provide computer results for several different 
Cox models that were fit to this dataset. A number of ques¬ 
tions will be asked about these results starting below. 


Model 1: 

Variable 

Coef. 

Std. Err. 

P > Izl 

Haz. Ratio [95% Conf.Interval] 

Rx 

0.894 

1.815 

.622 

2.446 

0.070 

85.812 

Sex 

-1.012 

0.752 

.178 

0.363 

0.083 

1.585 

log WBC 

1.693 

0.441 

.000 

5.437 

2.292 

12.897 

Rx x Sex 

1.952 

0.907 

.031 

7.046 

1.191 

41.702 

Rx x log WBC 

-0.151 

0.531 

.776 

0.860 

0.304 

2.433 


Log likelihood = - 

-69.515 




Model 2: 







Rx 

0.405 

0.561 

.470 

1.500 

0.499 

4.507 

Sex 

-1.070 

0.725 

.140 

0.343 

0.083 

1.422 

log WBC 

1.610 

0.332 

.000 

5.004 

2.610 

9.592 

Rx x Sex 

2.013 

0.883 

.023 

7.483 

1.325 

42.261 


Log likelihood = - 

-69.555 




Model 3: 







Rx 

0.587 

0.542 

.279 

1.798 

0.621 

5.202 

Sex 

-1.073 

0.701 

.126 

0.342 

0.087 

1.353 

Rx x Sex 

1.906 

0.815 

.019 

6.726 

1.362 

33.213 


Log likelihood = - 

-83.475 




Model 4: 







Rx 

1.391 

0.457 

.002 

4.018 

1.642 

9.834 

Sex 

0.263 

0.449 

.558 

1.301 

0.539 

3.139 

log WBC 

1.594 

0.330 

.000 

4.922 

2.578 

9.397 


Log likelihood = —72.109 

a. Use the above computer results to carry out a chunk 
test to evaluate whether the two interaction terms in 
model 1 are significant. What are your conclusions? 

b. Evaluate whether you would prefer model 1 or 
model 2. Explain your answer. 

c. Using model 2, give an expression for the hazard ratio 
for the effect of the Rx variable adjusted for SEX and 
log WBC. 

d. Using your answer in part 3c, compute the hazard ratio 
for the effect of Rx for males and for females separately. 

e. By considering the potential confounding of log WBC, 
determine which of models 2 and 3 you prefer. Explain. 

f. Of the models provided which model do you consider 
to be best? Explain. 
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Answers to 

Practice 

Exercises 


l. 


a. 


b. 

c. 


d. 


h(t,X) = MOexptPj SNI+ (3 2 AGE + |3 3 RACE 
+ (3 4 SEX + (3 5 SNI x AGE + |3 6 SNI x RACE 
+ (3 7 SNI x SEX] 

HR = exp [2|3j + 2(AGE)|3 5 + 2(RACE)[3 6 + 2(SEX)(3 7 ] 
Hq\ (3 5 = = (3 7 = 0 

Likelihood ratio test statistic: — 2 In Lr — (—2 In Lp ), 
which is approximately xf under H 0 , where R 
denotes the reduced model (containing no product 
terms) under Hq, and F denotes the full model (given 
in part la above). 

95% Cl for adjusted HR: 


exp 


2(3 4 ± 1.96 x 2Jvar(|3!) 


A/ rs , ,-iexp[4Pi+(AGE$2+(RACE)(b+(SEX$ 4 ] 

e. S(t,X) = [So(t)J 

f. The two survival curves will not cross, because both 
are computed using the same proportional hazards 
model, which has the property that the hazard 
functions, as well as their corresponding estimated 
survivor functions, will not cross. 


2. a. h(t ,X) — ho(t)exp[|3 1 X 1 + |3 3 X 3 + P 4 X 4 + P 5 X 5 
+ P7X7 + ... + PkjXio] 

b. Adeno cell type: X* = (treatment, 1,0,0, perfstat, 
disdur, age, prther) 

Large cell type: X = (treatment, 0, 0, 0, perfstat, 
disdur, age, prther) 

E PfCX? - Xi) 

i =1 

= exp [0 + |3 3 (1 - 0) + |3 4 (0 - 0) 

+ |3 5 (0 — 0 ) + 0 + • • • + 0 ] 

= exp [|3 3 ] = exp [0.789] = 2.20 

c. Adeno cell type: X* = (treatment, 1, 0, 0, perfstat, 
disdur, age, prther) 

Squamous cell type: X = (treatment, 0, 0, 1, perfstat, 
disdur, age, prther) 

E P, (X* - X0 

i=i 

= exp [0 + |3 3 (1 - 0) + (3 4 (0 - 0) 

+ (3g (0 — 1) + 0 + • ■ ■ + 0] 

= exp[|3 3 — j3 5 ] = exp[0.789 
- (-0.400)] = exp [1.189] = 3.28 


max*) 

HR ~ h(t,X) p 


HR c::p 

HR ~ h{t,X) p 
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d. There does not appear to be an effect of treatment on 
survival time, adjusted for the other variables in the 
model. The hazard ratio is 1.3, which is close to the 
null value of one, the p-value of 0.162 for the Wald 
test for treatment is not significant, and the 95% 
confidence interval for the treatment effect 
correspondingly includes the null value. 

e. S(f,X) _ ___ 

_ [ i 5 0 (^]exp[2j3i+j3 5 +(perfstat)p 7 +(disdur)i38+(age)p9+(prther)i3io] 

f- HR = \ = exp[|3! + (perfstat)|3n + (disdur)(3 12 

n{t,X) 

+ (age)|3 13 + (prther)|3 14 ] 

where |3 1 is the coefficient of the treatment variable 
and p llf (3 12 , (3 13 , and |3 14 are the coefficients of 
product terms involving treatment with the four 
variables indicated. 

3. a. HR — exp[0.470 + (—0.008)age + (—0.503)sex] 

b. 40-year-old male: 

HR = exp[0.470 + (-0.008)40 + (-0.503)1] = 0.70 
50-year-old Female: 

HR = exp[0.470 + (-0.008)50 + (-0.503)2] = 0.39 

c. The LR (chunk) test for the significance of both 
interaction terms simultaneously yields the following 
likelihood ratio statistic which compares models 1 
and 2: 

LR = [(-2 x -153.253) - (-2 x -153.040)] 

= 306.506 - 306.080 = 0.426 

This statistic is approximately chi-square with 
2 degrees of freedom under the null hypothesis of no 
interaction. This LR statistic is highly nonsignificant. 
Thus, we conclude that there is no significant 
interaction in the model (1). 

d. The gold-standard hazard ratio is 0.484, which is 
obtained for model 2. Note that model 2 contains no 
interaction terms and controls for both covariates of 
interest. When either age or sex or both are dropped 
from the model, the hazard ratio (for platelets) does 
not change appreciably. Therefore, it appears that 
neither age nor sex need to be controlled for 
confounding. 
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e. Models 2-5 are all more or less equivalent, since they 
all give essentially the same hazards ratio and 
confidence interval for the effect of the platelet 
variable. A political choice for best model would be 
the gold-standard model (2), because the critical 
reviewer can see both age and sex being controlled in 
model 2. 

f. • The point estimate of the hazard ratio for 

normal versus abnormal platelet count is 
0.484 = 1/2.07, so that the hazard for an 
abnormal count is twice that for a normal 
count. 

• There is a borderline significant effect of 
platelet count on survival adjusted for age 
and sex (P = .071). 

• The 95% Cl for the hazard ratio is given by 
0.221 < HR < 1.063, which is quite wide and 
therefore shows a very imprecise estimate. 


Evaluating 
the Propor¬ 
tional 
Hazards 
Assumption 
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Introduction 


Abbreviated 

Outline 


We begin with a brief review of the characteristics of the Cox 
proportional hazards (PH) model. We then give an overview 
of three methods for checking the PH assumption: graphi¬ 
cal, goodness-of-fit (GOF), and time-dependent variable ap¬ 
proaches. 

We then focus on each of the above approaches, starting with 
graphical methods. The most popular graphical approach in¬ 
volves the use of “log-log” survival curves. A second graphical 
approach involves the comparison of “observed” with “ex¬ 
pected” survival curves. 

The GOF approach uses a test statistic or equivalent p-value 
to assess the significance of the PH assumption. We illustrate 
this test and describe some of its advantages and drawbacks. 

Finally, we discuss the use of time-dependent variables in an 
extended Cox model as a third method for checking the PH 
assumption. A more detailed description of the use of time- 
dependent variables is provided in Chapter 6. 


The outline below gives the user a preview of the material to 
be covered by the presentation. A detailed outline for review 
purposes follows the presentation. 

I. Background (pages 134-135) 

II. Checking the PH assumption: Overview 
(pages 135-137) 

III. Graphical approach 1: log-log plots 
(pages 137-145) 

IV. Graphical approach 2: observed versus expected 
plots (pages 145-150) 

V. The goodness-of-fit (GOF) testing approach 
(pages 151-153) 

VI. Assessing the PH assumption using 

time-dependent covariates (pages 153-157) 
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Objectives 


Upon completing this chapter, the learner should be able to: 

1. State or recognize three general approaches for evaluating 
the PH assumption. 

2. Summarize how log-log survival curves may be used to 
assess the PH assumption. 

3. Summarize how observed versus expected plots may be 
used to assess the PH assumption. 

4. Summarize how GOF tests may be used to assess the PH 
assumption. 

5. Summarize how time-dependent variables may be used 
to assess the PH assumption. 

6 . Describe—given survival data or computer output from a 
survival analysis that uses a Cox PH model—how to assess 
the PH assumption for one or more variables in the model 
using: 

a. a graphical approach 

b. the GOF approach 

c. an extended Cox model with time-dependent 
covariates 

7. State the formula for an extended Cox model that pro¬ 
vides a method for checking the PH assumption for one 
or more of the time-independent variables in the model, 
given survival analysis data or computer output that uses 
a Cox PH model. 
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Presentation 



This presentation describes three approaches for 
evaluating the proportional hazards (PH) assump¬ 
tion of the Cox model—a graphical procedure, a 
goodness-of-fit testing procedure, and a procedure 
that involves the use of time-dependent variables. 


I. Background 


Cox PH model: 

P 

E fr x i 

h(t,X) = ho(t)e i=i 


X = (Xi, X 2 ,,X P ) explanatory/ 
predictor variables 


Recall from the previous chapter that the general 
form of the Cox PH model gives an expression for 
the hazard at time t for an individual with a given 
specification of a set of explanatory variables de¬ 
noted by the bold X. 

The Cox model formula says that the hazard at 
time t is the product of two quantities. The first 
of these, ho(t), is called the baseline hazard func¬ 
tion. The second quantity is the exponential ex¬ 
pression e to the linear sum of (3, A,, where the 
sum is over the p explanatory X variables. 


h 0 (t) x 


E P. A 


e<= 1 


Baseline hazard 


Exponential 


Involves t but Involves X’s but 
not As not t (As are time- 

independent) 


An important feature of this formula, which con¬ 
cerns the proportional hazards (PH) assumption, 
is that the baseline hazard is a function of t, but 
does not involve the As, whereas the exponential 
expression involves the As, but does not involve t. 
The As here are called time-independent As. 


As involving t: time-dependent 


Requires extended Cox model 
(no PH) ^ 


Chapter 6 


It is possible, nevertheless, to consider As that 
do involve t. Such As are called time-dependent 
variables. If time-dependent variables are consid¬ 
ered, the Cox model form may still be used, but 
such a model no longer satisfies the PH assump¬ 
tion, and is called the extended Cox model. We 
will discuss this extended Cox model in Chapter 6 
of this series. 
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Hazard ratio formula: 


HR = exp 




L i=l 


where X* * = (X*, X*, 
and X=(X l ,X 2 ,...,X p ) 
denote the two sets of X's. 


From the Cox PH model, we can obtain a gen¬ 
eral formula, shown here, for estimating a hazard 
ratio that compares two specifications of the X’s, 
defined as X* and X. 


Adjusted survival curves: 0 or 1 
Comparing E groups: / 

. „ exp[|3i£+E P;X;] 

S(t,X) = [S„(f)] 


m 


Single curve: 
S(CX)[S 0 (f)] exp[ ^' ] 


We can also obtain from the Cox model an expres¬ 
sion for an adjusted survival curve. Here we show 
a general formula for obtaining adjusted survival 
curves comparing two groups adjusted for other 
variables in the model. Below this, we give a for¬ 
mula for a single adjusted survival curve that ad¬ 
justs for all X’s in the model. Computer packages 
for these formulae use the mean value of each X 
being adjusted in the computation of the adjusted 
curve. 


PH assumption: 


HtX) 


0 , constant over t 


i.e.,h(t,X*) = Qh(t,X) 


The Cox PH model assumes that the hazard ratio 
comparing any two specifications of predictors is 
constant over time. Equivalently, this means that 
the hazard for one individual is proportional to the 
hazard for any other individual, where the propor¬ 
tionality constant is independent of time. 


Hazards cross => PH not met 


Hazards don’t cross PH met 


The PH assumption is not met if the graph of 
the hazards cross for two or more categories of a 
predictor of interest. However, even if the hazard 
functions do not cross, it is possible that the PH 
assumption is not met. Thus, rather than check¬ 
ing for crossing hazards, we must use other ap¬ 
proaches to evaluate the reasonableness of the PH 
assumption. 


II. Checking the Proportional There are three general approaches for assess- 
Hazards Assumption: ing the PH assumption, again listed here. We 

Overview now briefly overview each approach, starting with 

graphical techniques. 

Three approaches: 

(» graphical ) 

• goodness-of-fit test 

• time-dependent variables 
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Graphical techniques: 

—ln(—In) S curves parallel? 

—ln(—In) S 


1 

1 - 

1 

1 

1 

"| Males 


1_ 

Females 


There are two types of graphical techniques avail¬ 
able. The most popular of these involves compar¬ 
ing estimated -ln(-ln) survivor curves over dif¬ 
ferent (combinations of) categories of variables 
being investigated. We will describe such curves 
in detail in the next section. Parallel curves, say 
comparing males with females, indicate that the 
PH assumption is satisfied, as shown in this illus¬ 
tration for the variable Sex. 


Time 


Observed vs. predicted: Close? 
s 



An alternative graphical approach is to compare 
observed with predicted survivor curves. The ob¬ 
served curves are derived for categories of the vari¬ 
able being assessed, say, Sex, without putting this 
variable in a PH model. The predicted curves are 
derived with this variable included in a PH model. 
If observed and predicted curves are close, then 
the PH assumption is reasonable. 


Time 

Predicted for males 
(sex in model) 

Observed for males 


Goodness-of-fit (GOF) tests: 

• Large sample Z or chi-square 
statistics 

• Gives p-value for evaluating PH 
assumption for each variable in 
the model. 

p-value large =>• PH satisfied 
(e.g. P > 0.10) 

p-value small =$• PH not satisfied 
(e.g. P < 0.05) 


A second approach for assessing the PH assump¬ 
tion involves goodness-of-fit (GOF) tests. This ap¬ 
proach provides large sample Z or chi-square 
statistics which can be computed for each vari¬ 
able in the model, adjusted for the other variables 
in the model. A p-value derived from a standard 
normal statistic is also given for each variable. 
This p-value is used for evaluating the PH assump¬ 
tion for that variable. A nonsignificant (i.e., large) 
p-value, say greater than 0.10, suggest that the 
PH assumption is reasonable, whereas a small 
p-value, say less than 0.05, suggests that the vari¬ 
able being tested does not satisfy this assumption. 


Time-dependent co variates: 

Extended Cox model: 

Add product term involving some 
function of time. 


When time-dependent variables are used to assess 
the PH assumption for a time-independent vari¬ 
able, the Cox model is extended to contain prod¬ 
uct (i.e., interaction) terms involving the time- 
independent variable being assessed and some 
function of time. 
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For example, if the PH assumption is being as¬ 
sessed for Sex, a Cox model might be extended to 
include the variable “Sex x t” in addition to Sex. If 
the coefficient of the product term turns out to be 
significant, we can conclude that the PH assump¬ 
tion is violated for Sex. 


GOF provides test statistic 
Graphical: subjective 
Time-dependent: computationally 
cumbersome 

GOF: global, may not detect 

specific departures from PH 


The GOF approach provides a single test statis¬ 
tic for each variable being assessed. This ap¬ 
proach is not as subjective as the graphical ap¬ 
proach nor as cumbersome computationally as 
the time-dependent variable approach. Neverthe¬ 
less, a GOF test may be too “global” in that it 
may not detect specific departures from the PH 
assumption that may be observed from the other 
two approaches. 


111. Graphical Approach 1: 
Log-Log Plots 

• log-log survival curves 

• observed versus expected 
survival curves 


The two graphical approaches for checking the PH 
assumption are comparing log-log survival curves 
and comparing observed versus expected survival 
curves. We first explain what a —In —In survival 
curve is and how it is used. 


log - log S = transformation of S 
= —ln(—InS) 

• In 5 is negative =>■ —(In §) is 

positive. 

• can’t take log of In S, but can 
take log of (—In S). 

• — ln(—In S) may be positive or 
negative. 


A log-log survival curve is simply a transforma¬ 
tion of an estimated survival curve that results 
from taking the natural log of an estimated sur¬ 
vival probability twice. Mathematically, we write a 
log-log curve as —ln(—In S). Note that the log of a 
probability such as S is always a negative number. 
Because we can only take logs of positive num¬ 
bers, we need to negate the first log before taking 
the second log. The value for —ln(—In S) may be 
positive or negative, either of which is acceptable, 
because we are not taking a third log. 1 


'An equivalent way to write —ln(—InS) is —\n{fgh(u)du), 
where /j h(u)du is called the "cumulative hazard” function. 
This result follows from the formula S(t) = exp[—/j h{u)du\, 
which relates the survivor function to the hazard function (see 
p. 14 in Chapter 1). 
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As an example, in the graph at left, the estimated 
survival probability of 0.54 is transformed to a log- 
log value of 0.484. Similarly, the point 0.25 on the 
survival curve is transformed to a —In —In value 
of-0.327. 


Note that because the survival curve is usually 
plotted as a step function, so will the log-log curve 
be plotted as a step function. 



To illustrate the computation of a log-log value, 
suppose we start with an estimated survival prob¬ 
ability of 0.54. Then the log-log transformation of 
this value is —ln(—In 0.54), which is —ln(0.616), 
because ln(0.54) equals —0.616. Now, contin¬ 
uing further, —ln(0.616) equals 0.484, because 
ln(0.616) equals —0.484. Thus, the transformation 
—ln(-ln 0.54) equals 0.484. 



As another example, if the estimated survival 
probability is 0.25, then -ln(—In 0.25) equals 
—ln(l.386), which equals -0.327. 


v-axis scale: 


1 

0 


S 


+oo 


—ln(—ln)S 


Note that the scale of the y-axis of an estimated 
survival curve ranges between 0 and 1, whereas the 
corresponding scale for a —ln(—In) curve ranges 
between —oo and +oo. 


log-log S for the Cox PH model: We now show why the PH assumption can be as¬ 

sessed by evaluating whether or not log-log curves 
are parallel. To do this, we must first describe the 
log-log formula for the Cox PH model. 
































Presentation: III. Graphical Approach 1: Log-Log Plots 139 


Cox PH hazard function: 


E P;*/ 

h(t,X) = ho(t)e’=' 


i 


From math 
Cox PH survival function: 


P 

E »i x i 

! 7=1 


S(t,x) = [So(0] 

/ 

Baseline survival function. 


We start with the formula for the survival curve 
that corresponds to the hazard function for the 
Cox PH model. Recall that there is a mathemati¬ 
cal relationship between any hazard function and 
its corresponding survival function. We therefore 
can obtain the formula shown here for the sur¬ 
vival curve for the Cox PH model. In this formula, 
the expression So(t) denotes the baseline survival 
function that corresponds to the baseline hazard 
function ho(t). 


log-log =>■ takes logs twice 


log #1: 

E Pi*i 

In S(tX) = e ,=1 x lnSo(t) 
0 < S(t,X) < 1 


Improbability) = negative value, 
so In S(t,X) and In S 0 (0 are 
negative. 

But —In S(tX) is positive, which 
allows us to take logs again. 

log #2: 


ln[—In S(t, X)] 

r e 

= In — e i=I xlnSo(t) 

E Pi*; 

= In ew +ln[-lnSb(0] 

p 

= ^2 PjW + ln[—lnSo(f)] 

i =1 


—ln[—lnS(t,X)] 

= -^2 (3 t Xi - ln[—In So(t)] 

i=i 

r 

ln[-lnS(t, X)] 

= +^2 fii Xi + 


The log-log formula requires us to take logs of this 
survival function twice. The first time we take logs 
we get the expression shown here. 

Now since S(t,X) denotes a survival probability, its 
value for any t and any specification of the vector 
X will be some number between 0 and 1. It follows 
that the natural log of any number between 0 and 
1 is a negative number, so that the log of S(t, X) as 
well as the log of So(t) are both negative numbers. 
This is why we have to put a minus sign in front of 
this expression before we can take logs a second 
time, because there is no such thing as the log of 
a negative number. 

Thus, when taking the second log, we must obtain 
the log of —lnS(t,X), as shown here. After using 
some algebra, this expression can be rewritten as 
the sum of two terms, one of which is the linear 
sum of the (3, X, and the other is the log of the 
negative log of the baseline survival function. 


This second log may be either positive or nega¬ 
tive, and we aren't taking any more logs, so we 
actually don’t have to take a second negative. How¬ 
ever, for consistency's sake, a common practice is 
to put a minus sign in front of the second log to ob¬ 
tain the —In —In expression shown here. Neverthe¬ 
less, some software packages do not use a second 
minus sign. 
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Two individuals: 

Xi = (X n ,X l2 ,...,X lp ) 
X 2 = (X 2 i, X 22 , , x 2p ) 


Now suppose we consider two different specifica¬ 
tions of the X vector, corresponding to two differ¬ 
ent individuals, Xi and X 2 . 


—ln[-ln S(t,Xi)] 

= - E u — ln[—lnSo(t)] 
i=l 

—ln[—In S(t,X 2 )] 

= - E P;X 2i —ln[—lnS 0 (t)] 

/=i 


Then the corresponding log-log curves for these 
individuals are given as shown here, where we 
have simply substituted Xi and X 2 for X in the 
previous expression for the log-log curve for any 
individual X. 


-ln[-ln S(t,Xi)] 

— (—ln[—In S(t, X 2 )]) 

= J2 ft (* 2 i - X u ) 

i =1 

does not involve t 


—ln[—In S(t,Xi)] 

= -ln[-lnS(t, X 2 )] 

+ E mx* - Xu) 

i =1 


Subtracting the second log-log curve from the first 
yields the expression shown here. This expression 
is a linear sum of the differences in corresponding 
predictor values for the two individuals. Note that 
the baseline survival function has dropped out, so 
that the difference in log-log curves involves an 
expression that does not involve time t. 

Alternatively, using algebra, we can write the 
above equation by expressing the log-log survival 
curve for individual Xi as the log-log curve for 
individual X 2 plus a linear sum term that is inde¬ 
pendent of t. 



The above formula says that if we use a Cox PH 
model and we plot the estimated log-log survival 
curves for individuals on the same graph, the two 
plots would be approximately parallel. The dis¬ 
tance between the two curves is the linear expres¬ 
sion involving the differences in predictor values, 
which does not involve time. Note, in general, if 
the vertical distance between two curves is con¬ 
stant, then the curves are parallel. 


Graphical approach using log-log The parallelism of log-log survival plots for the 
plots: PH model is appropriate if Cox PH model provides us with a graphical ap- 
“empirical” plots of log-log survival proach for assessing the PH assumption. That is, 
curves are parallel. if a PH model is appropriate for a given set of pre¬ 

dictors, one should expect that empirical plots of 
log-log survival curves for different individuals 
will be approximately parallel. 
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Empirical plots: use -ln[-ln.§] 
where 

1. S is a KM curve 

2. S is an adjusted survival curve 
for predictors satisfying the PH 
assumption; predictor being 
assessed not included in model 


By empirical plots, we mean plotting log-log sur¬ 
vival curves based on Kaplan-Meier (KM) esti¬ 
mates that do not assume an underlying Cox 
model. Alternatively, one could plot log-log sur¬ 
vival curves which have been adjusted for predic¬ 
tors already assumed to satisfy the PH assumption 
but have not included the predictor being assessed 
in a PH model. 



As an example, suppose we consider the compari¬ 
son of treatment and placebo groups in a clinical 
trial of leukemia patients, where survival time is 
time, in weeks, until a patient goes out of remis¬ 
sion. Two predictors of interest in this study are 
treatment group status (1 = placebo, 0 = treat¬ 
ment), denoted as Rx, and log white blood cell 
count (log WBC), where the latter variable is being 
considered as a confounder. 

A Cox PH model involving both these predictors 
would have the form shown at the left. To assess 
whether the PH assumption is satisfied for either 
or both of these variables, we would need to com¬ 
pare log-log survival curves involving categories 
of these variables. 

One strategy to take here is to consider the vari¬ 
ables one at a time. For the Rx variable, this 
amounts to plotting log-log KM curves for treat¬ 
ment and placebo groups and assessing paral¬ 
lelism. If the two curves are approximately par¬ 
allel, as shown here, we would conclude that the 
PH assumption is satisfied for the variable Rx. If 
the two curves intersect or are not parallel in some 
other way, we would conclude that the PH assump¬ 
tion is not satisfied for this variable. 

For the log WBC variable, we need to categorize 
this variable into categories—say, low, medium, 
and high—and then compare plots of log-log KM 
curves for each of the three categories. In this il¬ 
lustration, the three log-log Kaplan-Meier curves 
are clearly nonparallel, indicating that the PH as¬ 
sumption is not met for log WBC. 
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EXAMPLE: Computer Results 



Problems with log-log survival 
curve approach: 

How parallel is parallel? 
Recommend: 

• subjective decision 

• conservative strategy: assume 
PH is OK unless strong evidence 
of nonparallelism 


The above examples are sketches of some of the 
possibilities that could occur from comparisons of 
log-log curves. For the actual data set containing 
42 leukemia patients, computer results are shown 
here for each variable separately. Similar output 
using Stata, SAS, and SPSS packages is provided 
in the Computer Appendix. 

We first show the log-log KM curves by treatment, 
Rx. Notice that the two log-log curves are roughly 
parallel, indicating that the Rx variable satisfies 
the PH assumption when being considered by it¬ 
self. 

Here we show the log-log KM curves by log WBC, 
where we have divided this variable into low (be¬ 
low 2.3), medium (between 2.3 and 3), and high 
(above 3) values. Notice that there is some indi¬ 
cation of nonparallelism below 8 days, but that 
overall the three curves are roughly parallel. Thus, 
these plots suggest that the PH assumption is more 
or less satisfied for the variable log WBC, when 
considered alone. 

As a third example, we consider the log-log KM 
plots categorized by Sex from the remission data. 
Notice that the two curves clearly intersect, and 
are therefore noticeably nonparallel. Thus, the 
variable, Sex, when considered by itself, does not 
appear to satisfy the PH assumption and therefore 
should not be incorporated directly into a Cox PH 
model containing the other two variables, Rx and 
log WBC. 

The above examples suggest that there are some 
problems associated with this graphical approach 
for assessing the PH assumption. The main prob¬ 
lem concerns how to decide “how parallel is par¬ 
allel?” This decision can be quite subjective for a 
given data set, particularly if the study size is rel¬ 
atively small. We recommend that one should use 
a conservative strategy for this decision by assum¬ 
ing the PH assumption is satisfied unless there is 
strong evidence of nonparallelism of the log-log 
curves. 
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How to categorize a continuous 

variable? 

• many categories => data “thins 
out” 

• different categorizations may 
give different graphical pictures 

Recommend: 

• small # of categories (2 or 3) 

• meaningful choice 

• reasonable balance (e.g., 
terciles) 

How to evaluate several variables si¬ 
multaneously? 


Strategy: 

• categorize variables separately 

• form combinations of categories 

• compare log-log curves on same 
graph 

Drawback: 

• data “thins out” 

• difficult to identify variables 
responsible for nonparallelism 



Another problem concerns how to categorize a 
continuous variable like log WBC. If many cat¬ 
egories are chosen, the data “thins out” in each 
category, making it difficult to compare different 
curves. [Also, one categorization into, say, three 
groups may give a different graphical picture from 
a different categorization into three groups.] 

In categorizing continuous variables, we recom¬ 
mend that the number of categories be kept rea¬ 
sonably small (e.g., two or three) if possible, and 
that the choice of categories be as meaningful as 
possible and also provide reasonable balance of 
numbers (e.g., as when using terciles). 

In addition to the two problems just described, 
another problem with using log-log survival plots 
concerns how to evaluate the PH assumption for 
several variables simultaneously. 

One strategy for simultaneous comparisons is to 
categorize all variables separately, form combi¬ 
nations of categories, and then compare log-log 
curves for all combinations on the same graph. 


A drawback of this strategy is that the data will 
again tend to “thin out” as the number of com¬ 
binations gets even moderately large. Also, even 
if there are sufficient numbers for each combined 
category, it is often difficult to determine which 
variables are responsible for any nonparallelism 
that might be found. 

As an example of this strategy, suppose we use the 
remission data again and consider both Rx and 
log WBC together. Because we previously had two 
categories of Rx and three categories of log WBC, 
we get a total of six combined categories, consist¬ 
ing of treated subjects with low log WBC, placebo 
subjects with low log WBC, treated subjects with 
medium log WBC, and so on. 
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EXAMPLE (continued) 


Log-log KM curves by six combinations of 
Rx by log WBC 



Plots suggest PH not satisfied. However, 
the study is small, i.e., plots are unreliable. 


The computer results are shown here for the log- 
log curves corresponding to each of the six com¬ 
binations of Rx with log WBC. Notice that there 
are several points of intersection among the six 
curves. Therefore, these results suggest that the 
PH assumption is not satisfied when considering 
Rx and log WBC together. 


However, the sample sizes used to estimate these 
curves are quite small, ranging between four sub¬ 
jects for group 4 {Rx = l,log WBC = low) to 
twelve subjects for group 6 (Rx = 1, log WBC = 
high), with the total study size being 42. Thus, for 
this small study, the use of six log-log curves pro¬ 
vides unreliable information for assessing the PH 
assumption. 


Alternative strategy: 

Adjust for predictors already 
satisfying PH assumption, i.e., use 
adjusted log—log S curves 


An alternative graphical strategy for considering 
several predictors together is to assess the PH as¬ 
sumption for one predictor adjusted for other pre¬ 
dictors that are assumed to satisfy the PH assump¬ 
tion. Rather than using Kaplan-Meier curves, this 
involves a comparison of adjusted log-log survival 
curves. 


EXAMPLE 


Remission data: 

• compare Rx categories adjusted for log 
WBC 

• fit PH model for each Rx stratum 

• obtain adjusted survival curves using 
overall mean of log WBC 


Log-log S curves for Rx groups using PH 
model adjusted for log WBC 



As an example, again we consider the remission 
data and the predictors Rx and log WBC. To as¬ 
sess the PH assumption for Rx adjusted for log 
WBC, we would compare adjusted log-log survival 
curves for the two treatment categories, where 
each adjusted curve is derived from a PH model 
containing log WBC as a predictor. In computing 
the adjusted survival curve, we need to stratify the 
data by treatment, fit a PH model in each stratum, 
and then obtain adjusted survival probabilities us¬ 
ing the overall mean log WBC in the estimated sur¬ 
vival curve formula for each stratum. 

For the remission data example, the estimated 
log-log survival curves for the two treatment 
groups adjusted for log WBC are shown here. No¬ 
tice that these two curves are roughly parallel, in¬ 
dicating that the PH assumption is satisfied for 
treatment. 
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As another example, we consider adjusted log-log 
survival curves for three categories of log WBC, ad¬ 
justed for the treatment status (Rx) variable. The 
adjusted survival probabilities in this case use the 
overall mean Rx score, i.e., 0.5, the proportion of 
the 42 total subjects that are in the placebo group 
(i.e., half the subjects have a score of Rx = 1). 


The three log-log curves adjusted for treatment 
status are shown here. Although two of these 
curves intersect early in follow-up, they do not sug¬ 
gest a strong departure from parallelism overall, 
suggesting that the PH assumption is reasonable 
for log WBC, after adjusting for treatment status. 

As a third example, again using the remission data, 
we assess the PH assumption for Sex, adjusting for 
both treatment status and log WBC in the model. 
This involves obtaining log-log survival curves for 
males and females separately, using a PH model 
that contains both treatment status and log WBC. 
The adjustment uses the overall mean treatment 
score and the overall mean log WBC score in the 
formula for the estimated survival probability. 


The estimated log-log survival curves for Sex, ad¬ 
justed for treatment and log WBC are shown here. 
These curves clearly cross, indicating that the PH 
assumption is not satisfied for Sex, after adjusting 
for treatment and log WBC. 

/ 1. log-log survival curves We have thus described and illustrated one of the 

2. observed versus expected two graphical approaches for checking the PH as- 

survival curves sumption, that is, using log-log survival plots. In 

the next section, we describe an alternative ap¬ 
proach that compares “observed” with “expected” 
survival curves. 


EXAMPLE (continued) 



IV. Graphical Approach 2: 
Observed Versus 
Expected Plots 


The use of observed versus expected plots to as¬ 
sess the PH assumption is the graphical analog of 
the goodness-of-fit (GOF) testing approach to be 
described later, and is therefore a reasonable alter¬ 
native to the log-log survival curve approach. 


Graphical analog of GOF test 
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Two strategies: 

1. One-at-a-time: uses KM curves to 
obtain observed plots 

2. Adjusting for other variables: 
uses stratified Cox PH model to 
obtain observed plots (see 
Chapter 5) 


As with the log-log approach, the observed versus 
expected approach may be carried out using ei¬ 
ther or both of two strategies—(1) assessing the 
PH assumption for variables one-at-a-time, or (2) 
assessing the PH assumption after adjusting for 
other variables. The strategy which adjusts for 
other variables uses a stratified Cox PH model to 
form observed plots, where the PH model contains 
the variables to be adjusted and the stratified vari¬ 
able is the predictor being assessed. The stratified 
Cox procedure is described in Chapter 5. 


Here, we describe only the one-at-a-time strat¬ 
egy, which involves using KM curves to obtain ob¬ 
served plots. 


One-at-a-time: 

• stratify data by categories of 
predictor 

• obtain KM curves for each 
category 


Using the one-at-a-time strategy, we first must 
stratify our data by categories of the predictor to 
be assessed. We then obtain observed plots by de¬ 
riving the KM curves separately for each category. 


EXAMPLE: Remission Data 



As an example, for the remission data on 42 
leukemia patients we have illustrated earlier, the 
KM plots for the treatment and placebo groups, 
with 21 subjects in each group, are shown here. 
These are the “observed” plots. 


To obtain “expected” plots, we fit a Cox PH model 
containing the predictor being assessed. We ob¬ 
tain expected plots by separately substituting the 
value for each category of the predictor into the 
formula for the estimated survival curve, thereby 
obtaining a separate estimated survival curve for 
each category. 

As an example, again using the remission data, we 
fit the Cox PH model with Rx as its only variable. 
Using the corresponding survival curve formula 
for this Cox model, as given in the box at the left, 
we then obtain separate expected plots by substi¬ 
tuting the values of 0 (for treatment group) and 1 
(for placebo group). The expected plots are shown 
here. 
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EXAMPLE (continued) 


Observed Versus Expected Plots by Rx 
S 



To compare observed with expected plots we then 
put both sets of plots on the same graph as shown 
here. 


If observed and expected plots are: 

• close, complies with PH 
assumption 

• discrepant, PH assumption 
violated 


If for each category of the predictor being as¬ 
sessed, the observed and expected plots are “close” 
to one another, we then can conclude that the PH 
assumption is satisfied. If, however, one or more 
categories show quite discrepant observed and ex¬ 
pected plots, we conclude that the PH assumption 
is violated. 


For the example shown above, observed and ex¬ 
pected curves appear to be quite close for each 
treatment group. Thus, we would conclude using 
this graphical approach that the treatment vari¬ 
able satisfies the PH assumption. 

Drawback: How close is close? An obvious drawback to this graphical approach 

is deciding “how close is close” when comparing 
Recommend: PH not satisfied only observed versus expected curves for a given cat- 
when plots are strongly discrepant, egory. This is analogous to deciding “how par¬ 
allel is parallel” when comparing log-log sur¬ 
vival curves. Here, we recommend that the PH 
assumption be considered as not satisfied only 
when observed and expected plots are strongly 
discrepant. 


EXAMPLE: Remission Data (continued) 


Observed and expected plots are close 
for each treatment group. 

Conclude PH assumption not violated. 
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EXAMPLE: Remission Data 


Observed Versus Expected Plots by Sex 
S 



PH assumption not satisfied for Sex. 
Same conclusion as with log-log curves. 


Continuous variable: 

• form strata from categories 

• observed plots are KM curves 
for each category 


• two options for expected plots 
1. Use PH model with k — 1 
dummy variables X, for k 
categories, i.e., 

/k —1 n 

h(t,X) = h(t) exp^^ 

Obtain adjusted survival 
curve: 

S(t,Xc) = [3 0 (f)] exp(Ete) 


where 

X c = (X c i,X c2 ,...,X cJ c-i) 
gives values of dummy 
variables for category c. 


As another example, again using the remission 
data, we consider observed versus expected plots 
by Sex, as shown here. Note that the observed plots 
for males and females, which are described by the 
thicker lines, cross at about 12 weeks, whereas the 
expected plots don’t actually intersect, with the fe¬ 
male plot lying below the male plot throughout 
follow-up. Moreover, for males and females sepa¬ 
rately, the observed and expected plots are quite 
different from one another. 


Thus, the above plots suggest that the PH assump¬ 
tion is not satisfied for the variable Sex. We came 
to the same conclusion when using log-log sur¬ 
vival curves, which crossed one another and were 
therefore clearly nonparallel. 

When using observed versus expected plots to as¬ 
sess the PH assumption for a continuous variable, 
observed plots are derived, as for categorical vari¬ 
ables, by forming strata from categories of the con¬ 
tinuous variable and then obtaining KM curves for 
each category. 

However, for continuous predictors, there are two 
options available for computing expected plots. 
One option is to use a Cox PH model which con¬ 
tains k— 1 dummy variables to indicate k cate¬ 
gories. The expected plot for a given category is 
then obtained as an adjusted survival curve by sub¬ 
stituting the values for the dummy variables that 
define the given category into the formula for the 
estimated survival curve, as shown here for cate¬ 
gory c. 
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Options for a continuous variable: 

2. Use PH model: 

h(t,X) = h 0 (t)ex p(|3X) 

\ 

Continuous 

Obtain adjusted survival curve: 


The second option is to use a Cox PH model con¬ 
taining the continuous predictor being assessed. 
Expected plots are then obtained as adjusted sur¬ 
vival curves by specifying predictor values that dis¬ 
tinguish categories, as, for example, when using 
mean predictor values for each category. 


s(t,x c ) = 


where X c denotes the mean value 
for the variable X within category 
c. 


EXAMPLE: Remission Data 


Observed (KM) Plots by log WBC Categories 


S 



Option 1: 

h(t,X) = h 0 (t) exptppf! + (5 2 X 2 ) 

f 1 if high ^ _ f 1 if medium 
[0 if other 


where X, = b if olher 


so that 

high = (1,0); medium = (0, 1); low = (0, 0) 


Expected survival plots: 

Xy = 1, X 2 = 0 : S(f, X high ) = [4(f)]=xp(Pi) 

X, = 0, X 2 = 1: S(t, X medium ) = &(/)]=P<k> 
X, = 0, X 2 = 0: S(t, X low ) = [^(f)] 


As an example to illustrate both options, we con¬ 
sider the continuous variable log WBC from the 
remission data example. To assess the PH assump¬ 
tion for this variable, we would first stratify log 
WBC into, say, three categories—low, medium, 
and high. The observed plots would then be ob¬ 
tained as KM curves for each of the three strata, 
as shown here. 


Using option 1, expected plots would be obtained 
by fitting a Cox PH model containing two dummy 
variables X\ and X 2 , as shown here, where X\ 
takes the values 1 if high or 0 if other and X 2 takes 
the values 1 if medium or 0 if other. Thus, when log 
WBC is high, the values of X\ and X 2 are 1 and 0, 
respectively; whereas when log WBC is medium, 
the values are 0 and 1, respectively; and when log 
WBC is low, the values are both 0. 

The expected survival plots for high, medium, and 
low categories are then obtained by substituting 
each of the three specifications of X\ and X 2 into 
the formula for the estimated survival curve, and 
then plotting the three curves. 
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EXAMPLE (continued) 


Expected Plots for log WBC Using 
Option 1 (Dummy Variables) 



Low 


High 


Medium 


24 


32 


Observed Versus Expected Plots Using 
Option 1 



Option 2: Treat log WBC as continuous 
h(f, X) = /z 0 (f)exp[p(log WBC)] 


log WBC high = 3.83: 

SfeX hlgh ) = [S 0 (f)]exp[3.83R| 


log WBC med = 2.64: 

S(t, X med ) = [So(f)]exp[2.64ft 
IogAVBC| ow = 1.71: 

S(t, X Iow ) = [S 0 (f)] ex p[i 7ip] 

Observed Versus Expected Plots for log 
WBC Using Option 2 



0.2 r 


Low 


33 - 


Medium 


16 


24 


32 


The expected plots using option 1 (the dummy 
variable approach) are shown here for the three 
categories of log WBC. 


Here we put the observed and expected plots on 
the same graph. Although there are some discrep¬ 
ancies, particularly early in follow-up for the low 
log WBC category, these plots suggest overall that 
the PH assumption is satisfied for log WBC. 


Using option 2, expected plots would be obtained 
by first fitting a Cox PH model containing the con¬ 
tinuous variable log WBC, as shown here. 

Adjusted survival curves are then obtained for 
specified values of log WBC that summarize the 
three categories used to form observed curves. 
Here, we find that the mean log WBC scores for 
low, medium, and high categories are, respec¬ 
tively, 1.71, 2.64, and 3.83. These values are sub¬ 
stituted into the estimated survival curve formula 
as shown here. 

Here are the observed and expected plots us¬ 
ing option 2. As with option 1, although there 
are some discrepancies within categories, overall, 
these plots suggest that the PH assumption is sat¬ 
isfied for the log WBC variable. 
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V. The Goodness of Fit (GOF) 
Testing Approach 

Statistical test appealing 

• Provides p-value 

• More objective decision than 
when using graphical approach 

Test of Harrel and Lee (1986) 

• Variation of test of Schoenfeld 

• Uses Schoenfeld residuals 


Schoenfeld residuals defined for 

• Each predictor in model 

• Every subject who has event 

Consider Cox PH model 
h(t) = h 0 (t) expfpjRX 

+ |3 2 log WBC + (3 3 SEX) 

3 predictors —> 3 Schoenfeld 

residuals for each 
subject who has 
event 

Schoenfeld residual for ith subject 
for LOGWBC 

Observed LOGWBC 
- LOGWBC weighted average 

Weights are other subjects’ hazard 
(from subjects still at risk) 

Underlying idea of test 
If PH holds then Schoenfeld residu¬ 
als uncorrelated with time 


The GOF testing approach is appealing because it 
provides a test statistic and p-value for assessing 
the PH assumption for a given predictor of inter¬ 
est. Thus, the researcher can make a more objec¬ 
tive decision using a statistical test than is typically 
possible when using either of the two graphical ap¬ 
proaches described above. 


A number of different tests for assessing the PH as¬ 
sumption have been proposed in the literature. We 
present the test of Harrel and Lee (1986), a varia¬ 
tion of a test originally proposed by Schoenfeld 
(1982) and based on the residuals defined by 
Schoenfeld, now called the Schoenfeld residuals. 

For each predictor in the model, Schoenfeld resid¬ 
uals are defined for every subject who has an event. 
For example, consider a Cox PH model with three 
predictors: RX, LOGWBC, and SEX. Then there 
are three Schoenfeld residuals defined for each 
subject who has an event, one for each of the three 
predictors. 


Suppose subject i has an event at time tj. Then 
her Schoenfeld residual for LOGWBC is her ob¬ 
served value of log white blood cell count minus a 
weighted average of the log white blood cell counts 
for the other subjects still at risk at time tj. The 
weights are each subject's hazard. 


The idea behind the statistical test is that if the 

PH assumption holds for a particular covariate 
then the Schoenfeld residuals for that covari¬ 
ate will not be related to survival time. 
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Steps for test implementation 

1. Obtain Schoenfeld residuals 

2. Rank failure times 

3. Test correlation of residuals to 
ranked failure time Hq: p = 0 


Ho rejected 

Conclude PH assumption violated 

PH test in Stata, SAS, SPSS 
shown in Computer Appendix 

Stata uses scaled Schoenfeld 
residuals rather than Schoenfeld 
residuals (typically similar results) 


EXAMPLE: Remission Data 


Column name 

Coeff. 

StErr. 

P{PH) 

Rx 

1.294 

0.422 I 

r 0.9\t 

log WBC 

1.604 

0.329 1 

k0.944, 


Both variables satisfy PH assumption. 

Note: P(PH ) = 0.917 assesses PH for 
Rx, assuming PH OK for log WBC. 


The implementation of the test can be thought of 
as a three-step process. 

Step 1. Run a Cox PH model and obtain 
Schoenfeld residuals for each predictor. 

Step 2. Create a variable that ranks the order of 
failures. The subject who has the first (earliest) 
event gets a value of 1, the next gets a value of 
2, and so on. 

Step 3. Test the correlation between the vari¬ 
ables created in the first and second steps. The 
null hypothesis is that the correlation between 
the Schoenfeld residuals and ranked failure 
time is zero. 

Rejection of the null hypothesis leads to a conclu¬ 
sion that the PH assumption is violated. 

The implementation of the test for the PH assump¬ 
tion in Stata, SAS, and, SPSS is shown in the Com¬ 
puter Appendix. Stata uses a slight variation of the 
test we just described in that it uses the scaled 
Schoenfeld residual rather than the Schoenfeld 
residual (Grambsch and Therneau, 1994). The 
tests typically (but not always) yield similar 
results. 

To illustrate the statistical test approach, we return 
to the remission data example. The printout on the 
left gives p-values P(PH) for treatment group and 
log WBC variables based on fitting a Cox PH model 
containing these two variables. 

The P(PH) values are quite high for both variables, 
suggesting that both variables satisfy the PH as¬ 
sumption. Note that each of these p-values tests 
the assumption for one variable given that the 
other predictors are included in the model. For 
example, the P(PH) of 0.917 assesses the PH as¬ 
sumption for Rx, assuming the PH assumption is 
satisfied for log WBC. 
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EXAMPLE 


Column name Coeff. StErr. P(PH ) 

Rx 1.391 0.457 

log WBC 1.594 0.330 

Sex_0.263 0.449 

log WBC and Rx satisfy PH. 

Sex does not satisfy PH. 

(Same conclusions using graphical 
approaches). 


0.935 

0.828 

.0.038J 


As another example, consider the computer re¬ 
sults shown here for a Cox PH model contain¬ 
ing the variable SEX in addition to log WBC and 
treatment group. The P(PH) values for log WBC 
and treatment group are still nonsignificant. How¬ 
ever, the P(PH) value for SEX is significant below 
the 0.05 level. This result suggests that log WBC 
and treatment group satisfy the PH assumption, 
whereas SEX does not. We came to the same con¬ 
clusion about these variables using the graphical 
procedures described earlier. 


Statistical Tests 

Null is never proven 

• May say not enough evidence to 
reject 

p-value can be driven by sample size 

• Small sample—gross violation 
of null may not be significant 

• Large sample—slight violation 
of null may be highly significant 


An important point concerning a testing approach 
is that the null hypothesis is never proven with a 
statistical test. The most that may be said is that 
there is not enough evidence to reject the null. A 
p-value can be driven by sample size. A gross viola¬ 
tion of the null assumption may not be statistically 
significant if the sample is very small. Conversely, 
a slight violation of the null assumption may be 
highly significant if the sample is very large. 


Test—more objective 
Graph—more objective, but can 
detect specific violations 

Recommend—Use both graphs and 
tests 


A statistical test offers a more objective approach 
for assessing the PH assumption compared to the 
subjectivity of the graphical approach. However, 
the graphical approach enables the researcher to 
detect specific kinds of departures from the PH 
assumption; the researcher can see what is going 
on from the graph. Consequently, we recommend 
that when assessing the PH assumption, the inves¬ 
tigator use both graphical procedures and statis¬ 
tical testing before making a final decision. 


VI. Assessing the PH 

Assumption Using Time- 
Dependent Covariates 

Extended Cox model: 
contains product terms of the form 
X x g(t), where g(t) is a function 
of time. 


When time-dependent variables are used to assess 
the PH assumption for a time-independent vari¬ 
able, the Cox model is extended to contain prod¬ 
uct (i.e., interaction) terms involving the time- 
independent variable being assessed and some 
function of time. 
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One-at-a-time model: 

h(f,X) = ho(t) exp[fiX + 8X xg(f)] 


When assessing predictors one-at-a-time, the ex¬ 
tended Cox model takes the general form shown 
here for the predictor X. 


Some choices for g(t): 
g(t) = t 
g(!) = log t 


g(t) = 


1 iff > to 
0 iff < to 


(heaviside 

function) 


One choice for the function g(t) is simply g(t) equal 
to t, so that the product term takes the form X x f. 
Other choices for g(t) are also possible, for exam¬ 
ple, log t. 


H (} : 8 = 0 

Under Ho, the model reduces to: 
h{tX) = h 0 (t)ex p[|3X] 


Using the above one-at-a-time model, we assess 
the PH assumption by testing for the significance 
of the product term. The null hypothesis is there¬ 
fore “6 equal to zero.” Note that if the null hypoth¬ 
esis is true, the model reduces to a Cox PH model 
containing the single variable X. 


Use either Wald statistic or 
likelihood ratio statistic: 

X 1 with 1 df under H 0 


The test can be carried out using either a Wald 
statistic or a likelihood ratio statistic. In either 
case, the test statistic has a chi-square distribu¬ 
tion with one degree of freedom under the null 
hypothesis. 


EXAMPLE 


h(t,X) = t!g(f)exp[p] Sex + P 2 (Sex x t)] 
p 2 * 0 => PH assumption violated 


For example, if the PH assumption is being as¬ 
sessed for Sex, a Cox model might be extended to 
include the variable Sex x t in addition to Sex. If 
the coefficient of the product term turns out to be 
significant, we can conclude that the PH assump¬ 
tion is violated for Sex. 2 


Strategies for assessing PH: 

• one-at-a-time 

• several predictors 
simultaneously 

• for a given predictor adjusted for 
other predictors 


In addition to a one-at-a-time strategy, the ex¬ 
tended Cox model can also be used to assess the 
PH assumption for several predictors simultane¬ 
ously as well as for a given predictor adjusted for 
other predictors in the model. 


2 In contrast, if the test for Ho. |3 2 = 0 is nonsignificant, we 
can conclude only that the particular version of the extended 
Cox model being considered is not supported by the data. 
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Several predictors simultaneously: 
h(t,X) = h 0 (t)ex p (Diw- 

-h,X, ■ l,- , IJ 

gi(t) = function of time for zth 
predictor 


To assess the PH assumption for several predictors 
simultaneously, the form of the extended model 
is shown here. This model contains the predic¬ 
tors being assessed as main effect terms and also 
as product terms with some function of time. 
Note that different predictors may require differ¬ 
ent functions of time; hence, the notation g,(0 
is used to define the time function for the zth 
predictor. 


Hq : 61 = §2 = • ■ ■ = = 0 

LR = 2 In L p| [ m0( ]el 

( 2 In T ex t. Coxmodel) 

~7P p 2 under Hq 


Cox PH (reduced) model: 


h(t,X) = h 0 (f)exp 



With the above model, we test for the PH assump¬ 
tion simultaneously by assessing the null hypoth¬ 
esis that all the 6, coefficients are equal to zero. 
This requires a likelihood ratio chi-square statis¬ 
tic with p degrees of freedom, where p denotes 
the number of predictors being assessed. The LR 
statistic computes the difference between the log 

likelihood statistic-2 In L —for the PH model 

and the log likelihood statistic for the extended 
Cox model. Note that under the null hypothesis, 
the model reduces to the Cox PH model shown 
here. 


EXAMPLE: Remission Data 


h(t,X) = h 0 (t)ex p [p! ( Rx ) 

+ p 2 (log WBC) + p 3 (Sex) 

+ 5] (Rx) x g(f) + 8 2 (log WBC) 
x g(f) + 8 3 (Sex) x g(t)] 
where g(f) =f 1 if t >7 
lo iff <7 
H 0 : 6 1= 8 2 = S 3 = 0 

LR - x 2 with 3 df 

If test is significant, use backward 
elimination to find predictors not 
satisfying PH assumption. 


As an example, we assess the PH assumption for 
the predictors Rx, log WBC, and Sex from the re¬ 
mission data considered previously. The extended 
Cox model is given as shown here, where the func¬ 
tions gi(t) have been chosen to be the same “heav- 
iside” function defined by g(t) equals 1 if t is 
7 weeks or more and g(t) equals 0 if t is less than 
7 weeks. The null hypothesis is that all three 5 co¬ 
efficients are equals to zero. The test statistic is a 
likelihood-ratio chi-square with 3 degrees of free¬ 
dom. 

If the above test is found to be significant, then we 
can conclude that the PH assumption is not satis¬ 
fied for at least one of the predictors in the model. 
To determine which predictor(s) do not satisfy the 
PH assumption, we could proceed by backward 
elimination of nonsignificant product terms until 
a final model is attained. 
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Heavyside function: 



ift > 7 
ift < 7 


h(t,X) differs for t > 7 and t < 7. 


Properties of heaviside functions 
and numerical results are described 
in Chapter 6. 


Note that the use of a heaviside function for g(t) 
in the above example yields different expressions 
for the hazard function depending on whether t 
is greater than or equal to 7 weeks or t is less 
than 7 weeks. Chapter 6 provides further details 
on the properties of heaviside functions, and also 
provides numerical results from fitting extended 
Cox models. 


Assessing PH for a given predictor 
adjusted for other predictors: 


h(t,X) = h 0 (f) exp 


P-i 

J>X ; + |3*X* 


;=i 


+ S*X*xg(t) 


X* = Predictor of interest 
H 0 : 8* = 0 

Wald or LR statistic ~x 2 with 1 df 


We show here an extended Cox model that can be 
used to evaluate the PH assumption for a given 
predictor adjusted for predictors already satis¬ 
fying the PH assumption. The predictor of inter¬ 
est is denoted as X*, and the predictors consid¬ 
ered to satisfy the PH assumption are denoted as 
X [. The null hypothesis is that the coefficient 6* of 
the product term X*g(t ) is equal to zero. The test 
statistic can either be a Wald statistic or a likeli¬ 
hood ratio statistic, with either statistic having a 
chi-square distribution with 1 degree of freedom 
under the null hypothesis. 



As an example, suppose, again considering the re¬ 
mission data, we assess the PH assumption for the 
variable, Sex, adjusted for the variables Rx and log 
WBC, which we assume already satisfy the PH as¬ 
sumption. Then, the extended Cox model for this 
situation is shown here. 


Two models for LR test of PH: 

1. Cox PH model 

2. extended Cox model 

See Computer Appendix for Stata, 
SAS, and SPSS 


To carry out the computations for any of the like¬ 
lihood ratio tests described above, two different 
types of models, a PH model and an extended Cox 
model, need to be fit. See the Computer Appendix 
for details on how the extended Cox model is fit 
using SAS, SPSS, and Stata. 


Drawback: choice of g,(f) 

Different choices may lead to differ¬ 
ent conclusions about PH assump¬ 
tion. 


The primary drawback of the use of an extended 
Cox model for assessing the PH assumption con¬ 
cerns the choice of the functions g,(t) for the 
time-dependent product terms in the model. This 
choice is typically not clear-cut, and it is possible 
that different choices, such as g(t) equal to t ver¬ 
sus log t versus a heaviside function, may result 
in different conclusions about whether the PH as¬ 
sumption is satisfied. 
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Chapter 6: Time-dependent 
covariates 


Further discussion of the use of time-dependent 
covariates in an extended Cox model is provided 
in Chapter 6. 


This presentation: 

Three methods for assessing PH. 

i. graphical 

ii. GOF 

iii. time-dependent covariates 

Recommend using at least two 
methods. 


This presentation is now complete. We have de¬ 
scribed and illustrated three methods for assess¬ 
ing the PH assumption: graphical, goodness-of- 
fit (GOF), and time-dependent covariate methods. 
Each of these methods has both advantages and 
drawbacks. We recommend that the researcher 
use at least two of these approaches when assess¬ 
ing the PH assumption. 


Chapters 


1. Introduction to Survival 
Analysis 

2. Kaplan-Meier Survival Curves 
and the Log-Rank Test 

3. The Cox Proportional Hazards 
Model and Its Characteristics 


✓ 4. 


Evaluating the Proportional 
Hazards Assumption 


We suggest that the reader review this presenta¬ 
tion using the detailed outline that follows. Then 
answer the practice exercises and the test that fol¬ 
low. 

The next Chapter (5) is entitled “The Stratified Cox 
Procedure.” There, we describe how to use a strat¬ 
ification procedure to fit a PH model when one 
or more of the predictors do not satisfy the PH 
assumption. 


Next: 


5. The Stratified Cox Procedure 

6. Extension of the Cox 
Proportional Hazards Model 
for Time-Dependent Variables 
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Detailed 

Outline 


I. Background (pages 134-135) 

A. The formula for the Cox PH model: 


v 


h(t ,X) = h 0 (t) exp J2 &i x i 


L/=i 


B. Formula for hazard ratio comparing two 
individuals, 




C. Adjusted survival curves using the Cox PH model: 

S(t,X) = [S 0 (t)] expC piXi] 

i. To graph S(t, X), must specify values for 


X = (X l ,X 2 ,...,X p ). 

ii. To obtain “adjusted” survival curves, usually use 


overall mean values for the X’s being adjusted. 

D. The meaning of the PH assumption 

i. Hazard ratio formula shows that hazard ratio is 
independent of time: 


h(t,X*) 
i(t,X ) 


ii. Hazard ratio for two I’s are proportional: 

h(t,x*) = eh(t,x) 


II. Checking the PH assumption: Overview (pages 
135-137) 

A. Three methods for checking the PH assumption: 

i. Graphical: compare —In —In survival curves or 
observed versus predicted curves. 

ii. Goodness-of-fit test: use a large sample Z 
statistic. 

iii. Time-dependent covariates: use product (i.e., 
interaction) terms of the form X x g(t). 

B. Abbreviated illustrations of each method are 
provided. 

III. Graphical approach 1: log-log plots (pages 137-145) 
A. A log-log curve is a transformation of an estimated 
survival curve, where the scale for a log-log curve is 
—oo to +oo. 
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B. The log-log expression for the Cox model survival 
curve is given by 

v 

—ln[—ln<S(t,X)] = — ^2 —ln[—ln<So(t)] 

i =1 

C. For the Cox model, the log-log survival curve for 
individual Xi can be written as the log-log curve for 
individual X 2 plus a linear sum term that is 
independent of time t. This formula is given by 

—ln[-ln<S(t,Xi)] 

= —ln[—ln«S(t,X 2 )] + J2 &(** - x u) 

i = 1 

D. The above log-log formula can be used to check the 
PH assumption as follows: the PH model is 
appropriate if “empirical” plots of log-log survival 
curves are parallel. 

E. Two kinds of empirical plots for —In —In S: 

i. S is a KM curve 

ii. S is an adjusted survival curve where predictor 
being assessed is not included in the Cox 
regression model. 

F. Several examples of log-log plots are provided using 
remission data from a clinical trial of leukemia 
patients. 

G. Problems with log-log curves: 

i. How parallel is parallel? 

ii. How to categorize a continuous variable? 

iii. How to evaluate several variables 
simultaneously? 

H. Recommendation about problems: 

i. Use small number of categories, meaningful 
choice, reasonable balance. 

ii. With several variables, two options: 

a. Compare log-log curves from combinations of 
categories. 

b. Adjust for predictors already satisfying PH 
assumption. 

IV. Graphical approach 2: observed versus expected 
plots (pages 145-150) 

A. Graphical analog of the GOF test. 

B. Two strategies 

i. One-at-a-time: uses KM curves to obtain 
observed plots. 

ii. Adjusting for other variables: uses stratified Cox 
PH model to obtain observed plots. 
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C. Expected plots obtained by fitting a Cox model 
containing the predictor being assessed; substitute 
into the fitted model the value for each category of 
the predictor to obtain the expected value for each 
category. 

D. If observed and expected plots are close, conclude 
PH assumption is reasonable. 

E. Drawback: how close is close? 

F. Recommend: conclude PH not satisfied only if plots 
are strongly discrepant. 

G. Another drawback: what to do if assessing 
continuous variable. 

H. Recommend for continuous variable: 

i. Form strata from categories. 

ii. Observed plots are KM curves for each category. 

iii. Two options for expected plots: 

a. Use PH model with k — 1 dummy variables 
for k categories. 

b. Use PH model with continuous predictor and 
specify predictor values that distinguish 
categories. 

V. The goodness-of-fit (GOF) testing approach (pages 
151-153) 

A. Appealing approach because 

i. provides a test statistic (p-value). 

ii. researcher can make clear-cut decision. 

B. References 

i. methodological: Schoenfeld (1982), Harrel and 
Lee (1986). 

ii. SAS and Stata use different GOF formulae. 

C. The method: 

i. Schoenfeld residuals for each predictor uses a 
chi-square statistic with 1 df. 

ii. Correlations between Schoenfeld’s residuals and 
ranked failure times. 

iii. If p-value small, then departure from PH. 

D. Examples using remission data. 

E. Drawbacks: 

i. global test: may fail to detect a specific kind of 
departure from PH; recommend using both 
graphical and GOF methods. 

ii. several strategies to choose from, with no one 
strategy clearly preferable (one-at-a-time, all 
variables, each variable adjusted for others). 
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VI. Assessing the PH assumption (using time- 
dependent covariates) (pages 153-157) 

A. Use extended Cox model: contains product terms of 
form X x g(t), where g(l ) is function of time, e.g., 
g(t) = t, or log t, or heaviside function. 

B. One-at-a-time model: 

h(t,X) = h 0 (t)exp[fiX + 6g(t)]. 

Test Hq'. 6 = 0 using Wald or LR test (chi-square 
with 1 df). 

C. Evaluating several predictors simultaneously: 

h(t, X) = ho(t)exp ^jy&iXi + SiXigm 

where g,(i) is function of time for ith predictor. Test 
Hq : 6 i = 62 = ■ ■ ■ = 6 P = 0 using LR (chi-square) 
test with p df. 

D. Examples using remission data. 

E. Two computer programs, required for test: 

i. Cox PH model program. 

ii. Extended Cox model program. 

F. Drawback: choice of g(t) not always clear; different 
choices may lead to different conclusions about PH 
assumption. 


Practice 

Exercises 


The dataset “vets.dat” considers survival times in days for 137 
patients from the Veteran’s Administration Lung Cancer Trial 
cited by Kalbfleisch and Prentice in their text (The Statistical 
Analysis of Survival Time Data, Wiley, pp. 223-224, 1980). The 
exposure variable of interest is treatment status (standard = 
1, test = 2). Other variables of interest as control variables 
are cell type (four types, defined by dummy variables), perfor¬ 
mance status, disease duration, age, and prior therapy status. 
Failure status is defined by the status variable (0 if censored, 
1 if died). A complete list of the variables is given below. 

Column 1: Treatment (standard = 1, test = 2) 

Column 2: Cell type 1 (large = 1, other = 0) 

Column 3: Cell type 2 (adeno = 1, other = 0) 

Column 4: Cell type 3 (small = 1, other = 0) 

Column 5: Cell type 4 (squamous = 1, other = 0) 
Column 6: Survival time (days) 
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Column 7: Performance status (0 = worst, ■ • • , 

100 = best) 

Column 8: Disease duration (months) 

Column 9: Age 

Column 10: Prior therapy (none = 0, some = 10) 

Column 11: Status (0 = censored, 1 = died) 

1. State the hazard function form of the Cox PH model that 
describes the effect of the treatment variable and controls for 
the variables, cell type, performance status, disease duration, 
age, and prior therapy. In stating this model, make sure to 
incorporate the cell type variable using dummy variables, but 
do not consider possible interaction variables in your model. 

2. State three general approaches that can be used to evaluate 
whether the PH assumption is satisfied for the variables in¬ 
cluded in the model you have given in question 1. 

3. The following printout is obtained from fitting a Cox PH 
model to these data. Using the information provided, 
what can you conclude about whether the PH assumption 
is satisfied for the variables used in the model? Explain briefly. 


Cox regression 

Coef. 

Std. Err. 

P > |z| 

Haz. Ratio 

[95% Conf. 
Interval] 

P(PH) 

Treatment 

0.290 

0.207 

0.162 

1.336 

0.890 

2.006 

0.628 

Large cell 

0.400 

0.283 

0.157 

1.491 

0.857 

2.594 

0.033 

Adeno cell 

1.188 

0.301 

0.000 

3.281 

1.820 

5.915 

0.081 

Small cell 

0.856 

0.275 

0.002 

2.355 

1.374 

4.037 

0.078 

Performance 

-0.033 

0.006 

0.000 

0.968 

0.958 

0.978 

0.000 

status 

Disease 

0.000 

0.009 

0.992 

1.000 

0.982 

1.018 

0.919 

duration 

Age 

-0.009 

0.009 

0.358 

0.991 

0.974 

1.010 

0.198 

Prior therapy 

0.007 

0.023 

0.755 

1.007 

0.962 

1.054 

0.145 


4. For the variables used in the PH model in question 3, describe 
a strategy for evaluating the PH assumption using log-log 
survival curves for variables considered one-at-a-time. 

5. Again considering the variables used in question 3, describe 
a strategy for evaluating the PH assumption using log-log 
survival curves that are adjusted for other variables in the 
model. 
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6. For the variable “performance status," describe how you 
would evaluate the PH assumption using observed versus 
expected survival plots? 

7. For the variable “performance status,” log-log plots which 
compare high (>50) with low (<50) are given by the follow¬ 
ing graph. Based on this graph, what do you conclude about 
the PH assumption with regard to this variable? 



8. What are some of the drawbacks of using the log-log ap¬ 
proach for assessing the PH assumption and what do you 
recommend to deal with these drawbacks? 

9. For the variable “performance status,” observed versus ex¬ 
pected plots that compare high (>50) with low (<50) are 
given by the following graph. Based on this graph, what do 
you conclude about the PH assumption with regard to this 
variable? 



0 200 400 600 800 1000 


10. State the form of an extended Cox model that allows for 
the one-at-a-time assessment of the PH assumption for the 
variable “performance status,” and describe how you would 
carry out a statistical test of the assumption for this variable. 

11. State the form of an extended Cox model that allows for the 
simultaneous assessment of the PH assumption for the vari¬ 
ables, treatment, cell type, performance status, disease du¬ 
ration, age, and prior therapy. For this model, describe how 
you would carry out a statistical test of the PH assump¬ 
tion for these variables. Also, provide a strategy for assess¬ 
ing which of these variables satisfy the PH assumption and 
which do not using the extended Cox model approach. 
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12. Using any of the information provided above and any addi¬ 
tional analyses that you perform with this dataset, what do 
you conclude about which variables satisfy the PH assump¬ 
tion and which variables do not? In answering this question, 
summarize any additional analyses performed. 


Test The following questions consider a dataset from a study by 

Caplehorn et al. (“Methadone Dosage and Retention of Pa¬ 
tients in Maintenance Treatment,” Med. J. Aust., 1991). These 
data comprise the times in days spent by heroin addicts from 
entry to departure from one of two methadone clinics. There 
are two additional covariates, namely, prison record and max¬ 
imum methadone dose, believed to affect the survival times. 
The dataset name is addicts.dat. A listing of the variables is 
given below: 

Column 1: Subject ID 
Column 2: Clinic (1 or 2) 

Column 3: Survival status (0 = censored, 1 = departed 
from clinic) 

Column 4: Survival time in days 

Column 5: Prison record (0 = none, 1 = any) 

Column 6: Maximum methadone dose (mg/day) 

1. The following edited printout was obtained from fitting 
a Cox PH model to these data: 


Cox regression 

Analysis time_t: [95% Conf. 


survt 

Coef. 

Std. Err. 

P > |z| 

Haz. Ratio 

Interval] 

P(PH) 

Clinic 

-1.009 

0.215 

0.000 

0.365 

0.239 

0.556 

0.001 

Prison 

0.327 

0.167 

0.051 

1.386 

0.999 

1.924 

0.332 

Dose 

-0.035 

0.006 

0.000 

0.965 

0.953 

0.977 

0.347 


No. of subjects: 238 Log likelihood = -673.403 


Based on the information provided in this printout, what 
do you conclude about which variables satisfy the PH 
assumption and which do not? Explain briefly. 
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2. Suppose that for the model fit in question 1, log-log 
survival curves for each clinic adjusted for prison and 
dose are plotted on the same graph. Assume that these 
curves are obtained by substituting into the formula for 
the estimated survival curve the values for each clinic 
and the overall mean values for the prison and dose 
variables. Below, we show these two curves. Are they 
parallel? Explain your answer. 



3. The following printout was obtained from fitting a 
stratified Cox PH model to these data, where the 
variable being stratified is clinic: 


Stratified 

Cox regression 
Analysis time t: 
survt (in days) 

Coef. 

Std. Err. 

P > [z[ 

Haz. Ratio 

[95% Conf. 
Interval] 

Prison 

0.389 

0.169 

0.021 

1.475 

1.059 2.054 

Dose 

-0.035 

0.006 

0.000 

0.965 

0.953 0.978 


No. of subjects = 238 Log likelihood = —597.714 Stratified by clinic 


Using the above fitted model, we can obtain the log-log 
curves below that compare the log-log survival for each 
clinic (i.e., stratified by clinic) adjusted for the variables 
prison and dose. Using these curves, what do you con¬ 
clude about whether or not the clinic variable satisfies 
the PH assumption? Explain briefly. 
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4. Consider the two plots of log-log curves below that 
compare the log-log survival for the prison variable 
ignoring other variables and adjusted for the clinic 
and dose variables. Using these curves, what do you 
conclude about whether or not the prison variable 
satisfies the PH assumption? Explain briefly. 


Log-log curves for prison 
ignoring other variables 
(i.e., using log-log KM curves) 



Log-log curves for prison 
adjusted for clinic and dose 
(i.e., stratified by prison) 



5. How do your conclusions from question 1 compare with 
your conclusions from question 4? If the conclusions dif¬ 
fer, which conclusion do you prefer? Explain. 

6. Describe briefly how you would evaluate the PH assump¬ 
tion for the variable maximum methadone dose using 
observed versus expected plots. 

7. State an extended Cox model that would allow you to as¬ 
sess the PH assumption for the variables clinic, prison, 
and dose simultaneously. For this model, state the null 
hypothesis for the test of the PH assumption and de¬ 
scribe how the likelihood ratio statistic would be ob¬ 
tained and what its degrees of freedom would be under 
the null hypothesis. 

8. State at least one drawback to the use of the extended 
Cox model approach described in question 7. 

9. State an extended Cox model that would allow you to 
assess the PH assumption for the variable clinic alone, 
assuming that the prison and dose variables already sat¬ 
isfy the PH assumption. For this model, state the null 
hypothesis for the test of the PH assumption, and de¬ 
scribe how the likelihood ratio (LR) statistic would be 
obtained. What is the degrees of freedom of the LR test 
under the null hypothesis? 
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10. Consider the situation described in question 9, where 
you wish to use an extended Cox model that would allow 
you to assess the PH assumption for the variable clinic 
alone, assuming that the assumption is satisfied for the 
prison and dose variables. Suppose you use the following 
extended Cox model: 

h(t,X ) = /2o(t)exp[|3! (prison) + |3 2 (dose) 

+ |3 3 (clinic) + 6i(clinic)g(t)] 

where g(t) is defined as follows: 

, , _ jl if? > 365 days 
^ — jo if? < 365days 

For the above model, what is the formula for the haz¬ 
ard ratio that compares clinic 1 to clinic 2 when t is 
greater than 365 days? when t is less than or equal to 
365 days? In terms of the hazard ratio formulae just de¬ 
scribed, what specific departure from the PH assumption 
is being tested when the null hypothesis is H 0 : 6j = 0? 


Answers to 

Practice 

Exercises 


1. h(t,X ) = h 0 {t) exp[(3! (treatment) + (3 2 (CT1) + |3 3 (CT2) 
+|3 4 (CT3) + |3 5 (PS) + |3 6 (DD) + |3 7 (Age) + (3 S (PT)] 

where CTi denotes the cell type i dummy variable, PS de¬ 
notes the performance status variable DD denotes the dis¬ 
ease duration variable, and PT denotes the prior therapy 
variable. 

2. The three general approaches for assessing the PH model 
for the above model are: 

(a) graphical, using either log-log plots or observed 
versus expected plots; 

(b) statistical test; 

(c) an extended Cox model containing product terms 
involving the variables being assessed with some 
function(s) of time. 

3. The P(PH) values given in the printout provide goodness- 
of-fit tests for each variable in the fitted model adjusted for 
the other variables in the model. The P(PH) values shown 
indicate that the large cell type variables and the perfor¬ 
mance status variable do not satisfy the PH assumption, 
whereas the treatment, age, disease duration, and prior 
therapy variables satisfy the PH assumption, and the adeno 
and small cell type variable are of borderline significance. 
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4. A strategy for evaluating the PH assumption using log-log 
survival curves for variables considered one-at-a-time is 
given as follows: 

For each variable separately, obtain a plot of obtain log- 
log Kaplan-Meier curves for the different categories of that 
variable. For the cell type variable, this requires obtaining 
a plot of four log-log KM curves, one for each cell type. 
(Note that this is not the same as obtaining four separate 
plots of two log-log curves, where each plot corresponds 
to one of the dummy variables used in the model.) For 
the variables PS, DD, and Age, which are interval vari¬ 
ables, each variable must be separately categorized into 
two or more groups—say, low versus high values—and 
KM curves are obtained for each group. For the variable 
PT, which is a dichotomous variable, two log-log curves 
are obtained which compare the “none” versus “some” 
groups. 

For each plot (i.e., one for each variable), those plots that 
are noticeably nonparallel indicate variables which do not 
satisfy the PH assumption. The remaining variables are as¬ 
sumed to satisfy the PH assumption. 

5. One strategy for evaluating the PH assumption for each 
variable adjusted for the others is to use adjusted log-log 
survival curves instead of KM curves separately for each of 
the variables in the model. That is, for each variable sepa¬ 
rately, a stratified Cox model is fit stratifying on the given 
variable while adjusting for the other variables. Those vari¬ 
ables that yield adjusted log-log plots that are noticeably 
nonparallel are then to be considered as not satisfying the 
PH assumption. The remaining variables are assumed to 
satisfy the PH assumption. 

A variation of the above strategy uses adjusted log-log 
curves for only those variables not satisfying the PH as¬ 
sumption from a one-at-a-time approach, adjusting for 
those variables satisfying the PH assumption from the one- 
at-a-time approach. This second iteration would flag a sub¬ 
set of the one-at-a-time flagged variables for further itera¬ 
tion. At each new iteration, those variables found to satisfy 
the assumption get added to the list of variables previously 
determined to satisfy the assumption. 


Answers to Practice Exercises 169 


6. For the performance status (PS) variable, observed plots 
are obtained by categorizing the variable into strata (say, 
two strata: low versus high) and then obtaining KM sur¬ 
vival plots for each stratum. Expected plots can be ob¬ 
tained by btting a Cox model containing the (continuous) 
PS variable and then obtaining estimated survival curves 
for values of the performance status (PS) variable that rep¬ 
resent summary descriptive statistics for the strata previ¬ 
ously identified. For example, if there are two strata, say, 
high (PS > 50) and low (PS < 50), then the values of PS to 
be used could be the mean or median PS score for persons 
in the high stratum and the mean or median PS score for 
persons in the low stratum. 

An alternative method for obtaining expected plots in¬ 
volves first dichotomizing the PS variable—say, into high 
and low groups—and then fitting a Cox model contain¬ 
ing the dichotomized PS variable instead of the original 
continuous variable. The expected survival plots for each 
group are estimated survival curves obtained for each value 
of the dichotomized PS variable. 

Once observed and expected plots are obtained for each 
stratum of the PS variable, they are then compared on the 
same graph to determine whether or not corresponding 
observed and expected plots are “close.” If it is determined 
that, overall, comparisons for each stratum are close, then 
it is concluded that the PH assumption is satisfied for the 
PH variable. In determining how close is close, the re¬ 
searcher should look for noticeably discrepant observed 
versus expected plots. 

7. The log-log plots that compare high versus low PS groups 
(ignoring other variables) are arguably parallel early in 
follow-up, and are not comparable later because survival 
times for the two groups do not overlap after 400 days. 
These plots do not strongly indicate that the PH assump¬ 
tion is violated for the variable PS. This contradicts the 
conclusion previously obtained for the PS variable using 
the P{PH ) results. 

8. Drawbacks of the log-log approach are: 

• How parallel is parallel? 

• How to categorize a continuous variable? 

• How to evaluate several variables simultaneously? 
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Recommendations about problems: 

• Look for noticeable nonparallelism; otherwise PH as¬ 
sumption is OK. 

• For continuous variables, use a small number of cat¬ 
egories, a meaningful choice of categories, and a rea¬ 
sonable balance in sample size for categories. 

• With several variables, there are two options: 

i. Compare log-log curves from combinations of cate¬ 
gories. 

ii. Adjust for predictors already satisfying PH assump¬ 
tion. 

9. The observed and expected plots are relatively close for 
low and high groups separately, although there is some¬ 
what more discrepancy for the high group than for the low 
group. Deciding how close is close is quite subjective for 
these plots. Nevertheless, because there are no major dis¬ 
crepancies for either low or high groups, we consider the 
PH assumption satisfied for this variable. 

10. h(t,X) = h Q (t) expfPjfPS) + 6(PS)g(t)] 

where g(t) is a function of t, such as g(t) — t, or g(t) = 
log t, or a heaviside function. The PH assumption is tested 
using a 1 df Wald or LR statistic for H 0 : 5 — 0. 

11. h(t,X) = h 0 (t) exp[(3j(treatment) + (3 2 (CT1) + |3 3 (CT2) 

+ (3 4 (CT3) + p 5 (PS) + |3 6 (DD) + |3 7 (Age) + (3 8 (PT) 

+ 5 1 (treatment x g(t)) + 6 2 (CT1 x g(t)) + 6 3 (CT2 x g(t)) 
+ 5 4 (CT3 x g(t)) + 6 5 (PS x g(t)) + 5 6 (DD x g(t)) 

+ 6 7 (Age x g(t)) + 6 g (PT x g(t))] 

where g(t) is some function of time, such as g(t) — t, or 
g(t) = log t, or a heavyside function. To test the PH as¬ 
sumption simultaneously for all variables, the null hypoth¬ 
esis is stated as H 0 : = 6 2 = ... = § 8 = 0. The test statis¬ 

tic is a likelihood-ratio statistic of the form 

LR = -2In L r - (—2lnLp) 

where R denotes the reduced (PH) model obtained when 
all 5 s are 0, and F denotes the full model given above. Un¬ 
der Hq, the LR statistic is approximately chi-square with 
8 df. 
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12. The question here is somewhat open-ended, leaving the 
reader the option to explore additional graphical, GOF, or 
extended Cox model approaches for evaluating the PH as¬ 
sumption for the variables in the model. The conclusions 
from the GOF statistics provided in question 3 are likely to 
hold up under further scrutiny, so that a reasonable con¬ 
clusion is that cell type and performance status variables 
do not satisfy the PH assumption, with the remaining vari¬ 
ables satisfying the assumption. 


The 

Stratified 



Cox 

Procedure 
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Introduction 


Abbreviated 

Outline 


We begin with an example of the use of the stratified Cox 
procedure for a single predictor that does not satisfy the PH 
assumption. We then describe the general approach for fitting 
a stratified Cox model, including the form of the (partial) like¬ 
lihood function used to estimate model parameters. 

We also describe the assumption of no interaction that is 
typically incorporated into most computer programs that 
carry out the stratified Cox procedure. We show how the no¬ 
interaction assumption can be tested, and what can be done 
if interaction is found. 

We conclude with a second example of the stratified Cox pro¬ 
cedure in which more than one variable is stratified. 


The outline below gives the user a preview of the material to 

be covered by the presentation. A detailed outline for review 

purposes follows the presentation. 

I. Preview (page 176) 

II. An Example (pages 176-180) 

III. The General Stratified Cox (SC) Model 
(pages 180-181) 

IV. The No-Interaction Assumption and How to Test 
It(pages 182-188) 

V. A Second Example Involving Several Stratification 
Variables (pages 188-193) 

VI. A Graphical View of the Stratified Cox Approach 
(pages 193-194) 

VII. Summary (pages 195-196) 
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Objectives 


Upon completing the chapter, the learner should be able to: 

1. Recognize a computer printout for a stratified Cox proce¬ 
dure. 

2. State the hazard form of a stratified Cox model for a given 
survival analysis scenario and/or a given set of computer 
results for such a model. 

3. Evaluate the effect of a predictor of interest based on com¬ 
puter results from a stratified Cox procedure. 

4. For a given survival analysis scenario and/or a given set 
of computer results involving a stratified Cox model, 

• state the no-interaction assumption for the given model; 

• describe and/or carry out a test of the no-interaction 
assumption; 

• describe and/or carry out an analysis when the no¬ 
interaction assumption is not satisfied. 
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I. Preview 

Stratified Cox model: 

• modification of Cox PH model 

• Stratification of predictor not 
satisfying PH 

• includes predictors satisfying 

PH 


tow stratification is 
carried out: 

computer results 
FOCUS • hazard function 

single predictor 
vs. > 2 predictors 
no-interaction vs v 
interaction 


The “stratified Cox model” is a modification of the 
Cox proportional hazards (PH) model that allows 
for control by “stratification” of a predictor that 
does not satisfy the PH assumption. Predictors 
that are assumed to satisfy the PH assumption are 
included in the model, whereas the predictor be¬ 
ing stratified is not included. 

In this presentation, we focus on how stratification 
is carried out by describing the analysis of com¬ 
puter results and the form of the hazard function 
for a stratified Cox model. We first consider strati¬ 
fying on a single predictor and then later consider 
stratifying on two or more predictors. Further, we 
distinguish between the use of a “no-interaction” 
version of the stratified Cox model and an alterna¬ 
tive approach that allows interaction. 


II. An Example 


EXAMPLE 


Clinical trial: 42 leukemia patients 
Response-days in remission 

Coef. Std. Err. P(PH) 

log WBC 1.594 0.330 

Rx 1.391 0.457 

Sex 0.263 0.449 


• log WBC and Rx satisfy PH 

• Sex does not satisfy PH 

(Same conclusions using graphical 
approaches) 


Stratified Cox (SC): 

• control for sex (stratified); 

• simultaneously include log WBC and 
Rx in the model 


0.828 

0.935 

0.031 


Consider the computer results shown here for a 
Cox PH model containing the three variables, log 
WBC, treatment group (Rx), and SEX. These re¬ 
sults derive from a clinical trial of 42 leukemia 
patients, where the response of interest is days in 
remission. 


From the printout, the P (PH) values for log WBC 
and treatment group are nonsignificant. However, 
the P (PH) value for SEX is significant below the 
.05 level. These results indicate that log WBC 
and treatment group satisfy the PH assumption, 
whereas the SEX variable does not. The same con¬ 
clusions regarding the PH assumption about these 
variables would also be made using the graphical 
procedures described earlier. 

Because we have a situation where one of the 
predictors does not satisfy the PH assumption, 
we carry out a stratified Cox (SC) procedure 
for the analysis. Using SC, we can control for 
the SEX variable—which does not satisfy the 
PH assumption—by stratification while simulta¬ 
neously including in the model the log WBC and 
treatment variables—which do satisfy the PH as¬ 
sumption. 
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EXAMPLE (continued) 


STATA OUTPUT USING SC: 

Stratified Cox regression 
Analysis time _t: survt 

Std. Haz. 

Coef. Err. p > Izl Ratio [95% Conf. Interval] 

log WBC 1 . 39 Q 0.338 0.000 4.016 2072 7 - 7 83 

Rx (o,93l) 0.472 0.048 (2.537) 1 006 6 -396 

No. of subjects = 42 Log likelihood = -57.560 Stratified by sex 

Appendix A illustrates SC procedures 
using Stata, SAS, and SPSS. 

• Log WBC and Rx are included in SC 
model. 

• SC model is stratified by SEX. 


Effect of Rx adjusted for log WBC and 
SEX: 

• Hazard ratio: 2.537 = e 0 - 931 

• Interpretation: Placebo group 

(Rx = 1) has 2.5 times the hazard as 
the treatment group (Rx = 0) 

Stratified Cox regression 
Analysis time _t: survt 

Std. Haz. 

Coef. Err. p > Izl Ratio [95% Conf. Interval] 

log WBC 1.390 0.338 0.000 4.016 2.072 7.783 

Rx ( 0.931 0.472 ) 0.048 2.537 (1.006 6.396) 

No. of subjects = 42 Log likelihood = (-57.560) Stratified by sex 

95% Cl iorRx (1.006, 6.396) indicates 
considerable variability. 

Cl formula: exp(0.931 ± 1.96 x 0.472) 


Wald test: P = 0.048 (two-tailed), 
significant at the 0.05 level. 


The computer results from a SC procedure are 
shown here. These results come from the Stata 
package. (See the Computer Appendix for running 
a SC procedure in Stata, SAS, or SPSS). 


The computer results show that the log WBC and 
Rx variables are included in the model listing, 
whereas the SEX variable is not included; rather, 
the model stratifies on the SEX variable, as indi¬ 
cated at the bottom of the output. Note that the 
SEX variable is being adjusted by stratification, 
whereas log WBC is being adjusted by its inclu¬ 
sion in the model along with Rx. 

In the above output, we have also circled some key 
information that can be used to assess the effect 
of the Rx variable adjusted for both log WBC and 
SEX. In particular, we can see that the hazard ra¬ 
tio for the effect of Rx adjusted for log WBC and 
SEX is given by the value 2.537. This value can be 
obtained by exponentiating the coefficient 0.931 
of the Rx variable. The hazard ratio value can be 
interpreted to mean that the placebo group (for 
which Rx = 1) has 2.5 times the hazard for going 
out of remission as the treatment group (for which 
Rx = 0). 

Also, we can see from the output that a 95% con¬ 
fidence interval for the effect of the Rx variable is 
given by the limits 1.006 to 6.396. This is a fairly 
wide range, thus indicating considerable variabil¬ 
ity in the 2.537 hazard ratio point estimate. Note 
that these confidence limits can be obtained by ex¬ 
ponentiating the quantity 0.931 plus or minus 1.96 
times the standard error 0.472. 

From the above output, a test for the significance 
of the Rx variable adjusted for log WBC and SEX is 
given by the Wald statistic P value of 0.048. This is 
a two-tailed P-value, and the test is just significant 
at the 0.05 level. 
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EXAMPLE (continued) 


LR test: Output for reduced model 
Stratified Cox regression 
Analysis time _t: survt 

Std. Haz. 

Coef. Err. p > Izl Ratio [95% Conf. Interval] 
log WBC 1.456 0.320 0.000 4.289 2.291 8.03 

No. of subjects = 42 Log likelihood = (—59.648) Stratified by sex 

LR = (-2 x -59.648) - (-2 x -57.560) 

= 119.296- 115.120 = 4.179 (P<0.05) 

LR and Wald give same conclusion. 

Hazard function for stratified Cox 
model: 

h g (t,X) = + p 2 log WBC] 

g= 1.2; 

g denotes stratum #. 

SC model for males and females: 

Females (g = 1): 

^[(t.X) = / 2 0 ](t)exp[p 1 Rx + p 2 log WBC] 
Males (g = 2): 

h 2 (t,X) = h 02 (t)exp[fi l Rx + p 2 log WBC] 

Rx and log WBC in the model 
Sex not in the model (stratified) 

HR for effect of Rx adjusted for log WBC 
and sex: 

eK 

where Pj is the coefficient of Rx. 


An alternative test involves a likelihood ratio (LR) 
statistic that compares the above model (full 
model) with a reduced model that does not con¬ 
tain the Rx variable. The output for the reduced 
model is shown here. The log-likelihood statistic 
for the reduced model is —2 times —59.648, 
which is to be compared with the log-likelihood 
statistic of —2 times —57.560 for the full model. 

The LR statistic is therefore 119.296 minus 
115.120, which equals 4.179. Under H 0 , this 
statistic has a chi-square distribution with one 
degree of freedom and is significant at the 0.05 
level. Thus, the LR and Wald tests lead to the 
same conclusion. 

So far, we have illustrated the results from a strat¬ 
ified Cox procedure without actually describing 
the model form being used. For the remission 
data example, we now present the hazard func¬ 
tion form for the stratified Cox model, as shown 
here. This hazard function formula contains a 
subscript g that indicates the gth stratum. 

Thus, in our remission data example, where we 
have stratified on SEX, g takes on one of two 
values, so that we have a different baseline hazard 
function for males and females. 

Notice that the hazard function formula contains 
the variables Rx and log WBC, but does not 
contain the variable SEX. SEX is not included 
in the model because it doesn't satisfy the PH 
assumption. So, instead, the SEX variable is 
controlled by stratification. 

Because the variables Rx and log WBC are 
included in the model, we can estimate the effect 
of each variable adjusted for the other variable 
and the SEX variable using standard exponential 
hazard ratio expressions. For example, the esti¬ 
mated hazard ratio for the effect of Rx, adjusted 
for log WBC and SEX, is given by e to the (3, “hat,” 
where (3 { is the coefficient of the Rx variable. 
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EXAMPLE (continued) 


Cannot estimate HR for SEX variable 
(SEX doesn't satisfy PH). 


Different baseline hazard functions: 
h m (t) for females and h 01 (t) for males. 

Same coefficients Pj and P 2 for both 
female and male models. 


h ol (t) => Survival curve 
Different for females 

baselines h 02 {t) => Survival curve 
i for males 


Females and males: 

same p! and P 2 => same HR’s, e.g., eP, 

No interaction assumption 
(see Section IV) 


Estimates of Pi and P 2 : 

Maximize partial likelihood (L), 
where L = L\xL 2 

Li is the likelihood for females derived 
from h\(t), 

and L 2 is the likelihood for males derived 
from h 2 (t). 


Nevertheless, because the SEX variable is not 
included in the model, it is not possible to obtain 
a hazard ratio value for the effect of SEX adjusted 
for the other two variables. This is the price to be 
paid for stratification on the SEX variable. Note 
that a single value for the hazard ratio for SEX 
is not appropriate if SEX doesn’t satisfy the PH 
assumption, because the hazard ratio must then 
vary with time. 

Notice also that the hazard functions for males 
and females differ only insofar as they have 
different baseline hazard functions, namely, 
h()] (t ) for females and / 202 (f) for males. However, 
the coefficients (3) and (3 2 are the same for both 
female and male models. 

Because there are different baseline hazard 
functions, the fitted stratified Cox model will yield 
different estimated survival curves for females 
and males. These curves will be described shortly. 

Note, however, that because the coefficients of Rx 
and log WBC are the same for females and males, 
estimates of hazard ratios, such as e to the |3j 
“hat,” are the same for both females and males. 
This feature of the stratified Cox model is called 
the “no-interaction” assumption. It is possible 
to evaluate whether this assumption is tenable 
and to modify the analysis if not tenable. We will 
discuss this assumption further in Section IV. 

To obtain estimates of (3, and |3 2 , a (partial) 
likelihood function (L) is formed from the model 
and the data; this function is then maximized 
using computer iteration. The likelihood function 
(L) for the stratified Cox (SC) model is different 
from the nonstratified Cox model. For the SC 
model, L is obtained by multiplying together 
likelihood functions for each stratum. Thus, L 
is equal to the product of L\ and L 2 , where L\ 
and L 2 denote the female and male likelihood 
functions, respectively which are derived from 
their respective hazard functions h\{t) and h 2 (t). 
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EXAMPLE (continued) 


Adjusted Survival Curves for Rx 
from Stratified Cox Model 
(adjusted for log WBC) 



As mentioned above, adjusted survival curves can 
be obtained for each stratum as shown here. Here 
we have shown four survival curves because we 
want to compare the survival for two treatment 
groups over each of two strata. 

If we compare treatment and placebo group sepa¬ 
rately by sex, we can see that the treatment group 
has consistently better survival prognosis than the 
placebo group for females and males separately. 
This supports our findings about the hazard ratio 
for the treatment effect derived earlier from the 
computer results for the stratified Cox model. 


111. The General Stratified 
Cox (SC) Model 

Example: one binary predictor 

General: several predictors, several 
strata 

Z\, Z 2 ,..., Zjt, do not satisfy PH 
Xi,X 2 , ...,X P , satisfy PH 


In the previous example, we illustrated the SC 
model for one binary predictor not satisfying the 
PH assumption. We now describe the general form 
of the SC model that allows for stratification of 
several predictors over several strata. 

We assume that we have k variables not satisfying 
the PH assumption and p variables satisfying the 
PH assumption. The variables not satisfying 
the PH assumption we denote as Z\, Z 2 ,..., Z*; 
the variables satisfying the PH assumption we de¬ 
note as X\, X 2 ,..., X p . 


Define a single new variable Z*: 

1. categorize each Z, 

2. form combinations of categories 
(strata) 

3. the strata are the categories of Z* 


To perform the stratified Cox procedure, we de¬ 
fine a single new variable, which we call Z*, from 
the Z’s to be used for stratification. We do this by 
forming categories of each Z,, including those Z, 
that are interval variables. We then form combi¬ 
nations of categories, and these combinations are 
our strata. These strata are the categories of the 
new variable Z*. 


EXAMPLE 


Age 


Treatment 

status 



Young 

Middle 

Old 

Placebo 

1 

2 

3 

Treatment 

4 

5 

6 


Z* = new variable with six categories 
Stratify on Z* 


For example, suppose k is 2, and the two Z’s are 
age (an interval variable) and treatment status 
(a binary variable). Then we categorize age into, 
say, three age groups—young, middle, and old. We 
then form six age group-by-treatment-status com¬ 
binations, as shown here. These six combinations 
represent the different categories of a single new 
variable that we stratify on in our stratified Cox 
model. We call this new variable Z*. 
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Z* has k* categories where k* = In general, the stratification variable Z* will have 
total # of combinations (strata), e.g., k* categories, where k* is the total number of 
k* = 6 in above example. combinations (or strata) formed after categoriz¬ 

ing each of the Z’s. In the above example, k* is 
equal to 6. 


The general SC model: 

h g (t,X) = /z 0g (t)exp[(3 1 Zi + |3 2 X 2 
+ • • • + |3 pX p ] 

g = 1,2 , ...,k*, strata defined 
from Z* 


We now present the general hazard function form 
for the stratified Cox model, as shown here. This 
formula contains a subscript g which indicates the 
gth stratum. The strata are defined as the different 
categories of the stratification variable Z*, and the 
number of strata equals k*. 


Z* not included in the model 

Xi,X 2 ,...,X p included in the 
model 


Note that the variable Z* is not explicitly included 
in the model but that the Z’s, which are assumed 
to satisfy the PH assumption, are included in the 
model. 


Different baseline hazard functions: 
hog(t),g = 1,2 ,...,k* 

Same coefficients: |3 x , |3 2 ,..., |3 p 


hodt) =>• Si(t) 
Different h 02 (f) S 2 (f) 

baselines : 

hokit) =>• Sk(t) 


Different 

survival 

curves 


Note also that the baseline hazard function ho g (t) 
is allowed to be different for each stratum. How¬ 
ever, the coefficients [3 1; |3 2 .(3 p are the same 

for each stratum. 

As previously described by example, the fitted 
SC model will yield different estimated survival 
curves for each stratum because the baseline haz¬ 
ard functions are different for each stratum. 


HR same for each stratum 

(no-interaction assumption, Sec¬ 
tion IV) 


However, because the coefficients of the X's are the 
same for each stratum, estimates of hazard ratios 
are the same for each stratum. This latter feature 
of the SC model is what we previously have called 
the “no-interaction” assumption to be discussed 
further in Section IV. 


(Partial) likelihood function: 


X 

•-J 

II 

•-J 

L 2 , X • 

• • X Life. 


Strata: 

1 

2 

k* 

Likelihood: 

£1 

L 2 .. 

L k , 

Hazard: 

ffi(LX) 

h 2 (t,X) .. 

h k .{t,X) 


To obtain estimates of the regression coefficients 
(3 j, |3 2 ,..., |3 p , we maximize a (partial) likelihood 
function L that is obtained by multiplying together 
likelihood functions for each stratum, as shown 
here. Thus, L is equal to the product of L, times 
L 2 , and so on, up until L/,<, where the subscripted 
L’s denote the likelihood functions for different 
strata, with each of these L’s being derived from 
its corresponding hazard function. 
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IV. The No-Interaction 
Assumption and How 
to Test It 

Stratified Cox model 

h g (t,X) = /z 0g (t)exp[(3 1 Zi 

+ ^ 2^2 + ' ' ' + fipXp] 

|3 coefficients do not vary over 
strata (no-interaction assumption) 

• how to evaluate 

• what to do if violated 


EXAMPLE 

No-interaction SC model: 


Stratified Cox regression 
Analysis time _t: survt 

Std. Haz. 

Coef. Err. p > Izl Ratio 

[95% Conf. Interval] 

log WBC f 1.390 'j 0.338 0.000 4.016 
Rx [o.931 J 0.472 0.048 2.537 

2.072 7.783 

1.006 6.396 


No. of subjects = 42 Log likelihood = —57.560 Stratified by sex 

Interaction by fitting separate models: 
Cox regression (Females) 

Analysis time _t: survt 


Column 

name 

Coeff 

StErr. 

P- 

value HR 

0.95 Cl 

P(PH) 

4 log 
WBC 

1.639 

0.519 

0.002 5.150 

1.862 14.242 

0.228 

5 Rx 

(l.859j 

0.729 

0.011 6.418 

1.537 26.790 

0.603 

No. of subjects = 20 

Log likelihood = -22.100 


Cox regression (Males) 
Analysis time _t: survt 



Column 

name 

Coeff 

StErr. 

P- 

value HR 

0.95 Cl 

P(PH) 

4 log 
WBC 

1.170' 

0.499 

0.019 3.222 

1.213 8.562 

0.674 

5 Rx 

0.267; 

0.566 

0.637 1.306 

0.431 3.959 

0.539 

No. of subjects = 22 

Log likelihood = -33.736 

Which model is more appropriate 
statistically? 



We previously pointed out that the SC model con¬ 
tains regression coefficients, denoted as [3's, that 
do not vary over the strata. We have called this 
property of the model the “no-interaction assump¬ 
tion.” In this section, we explain what this assump¬ 
tion means. We also describe how to evaluate the 
assumption and what to do if the assumption is 
violated. 


We return to the SC output previously illustrated. 
Notice that only one set of coefficients, namely, 
1.390 for log WBC and 0.931 for Rx, are provided, 
even though there are two strata, one for females 
and one for males. These results assume no 
interaction of the sex variable with either log 
WBC or Rx. 

If we allow for interaction, then we would 
expect to obtain different coefficients for each 
of the (SEX) strata. This would happen if we fit 
separate hazard models to the female and male 
data, with each model containing the log WBC 
and Rx variables. The computer results from 
fitting separate models are shown here. 

Notice that the coefficient of log WBC is 1.639 for 
females but is 1.170 for males. Also, the coefficient 
for Rx is 1.859 for females but 0.267 for males. 
These results show different coefficients for 
females than for males, particularly for the Rx 
variable. 

But are corresponding coefficients statistically 
different? That is, which model is more appropri¬ 
ate statistically, the no-interaction model or the 
interaction model? To answer this question, we 
must first look at the hazard function model for 
the interaction situation. 
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EXAMPLE (continued) 


Interaction model: 

(♦) h g (t,X) 

= /t 0 g(f)exp[p lg log WBC + p 2 jr Rx] 
where g = 1 (females), g = 2 (males) 


No-interaction model: 

h g (t,X ) = fc 0 g (f)exp[Pj log WBC + P 2 Rjc] 

where g = 1 (females), g = 2 (males) 

Alternative interaction model: 

(★) h g (t,X) = & 0 g (Oexp[Pi log WBC 
+ P 2 *Rx + p 3 * (SEX x log WBC) + P 4 
x (SEX x Rx)] 

if female 


where SEX = 


Pi 

lOi 


if male 


h Qg (t) are different forg = 1,2 
P* coefficients do not involve g 


Equivalence of models (♦) and (★): 
g = 1 (females), so that sex = 1: 

h\(t,X) = /t 0] (t)exp[pj' log WBC + p|Rx 
\ + Pj (1 x log WBC) + P 4 * (1 x Rx)] 
= fe 0 i(f)exp[((P( + PD) log WBC 

+ ((Pi + PI)) Rx] 

g = 2 (males), so that sex = 0: 
h 2 (t,X) = /t 02 (f)exp[Pj|' log WBC + p 2 Rx 

+P 3 ‘ (0 x log WBC) + p 4 * (0 x Rx), 
= /z 02 (f)exp[(pj)log WBC +(p|Rx] 

Interaction models in same format: 

Females (g = 1): h l (f,X) 

(♦) = /t 01 (f)exp[p n log WBC + p 2 i«x] 

(*) = (toi( f )exp[(pjf + P 3 ) log WBC 
+ (pi + P 4 )Rx] 

Males (g = 2): h 2 (f,X) 

(♦) =fc 02 (f)exp[p 12 log WBC + P 22 Rx] 

(★) = fe 02 (f)exp(PPog WBC + p 2 Rx] 


One way to state the hazard model formula when 
there is interaction is shown here (♦). Notice 
that each variable in this model has a different 
coefficient for females than for males, as indicated 
by the subscript g in the coefficients |3 lg and (3 2g . 

In contrast, in the no-interaction model, the 
coefficient ([3]) of log WBC is the same for 
females and for males; also, the coefficient ((3 2 ) 
of Rx is the same for females and for males. 

An alternative way to write the interaction model 
is shown here (★). This alternative form contains 
two product terms—SEX x log WBC and SEX x 
Rx —as well as the main effects of log WBC and 
Rx. We have coded the SEX so that 1 denotes 
female and 0 denotes male. 

In this alternative model, note that although the 
baseline hazards ho g (t) are different for each sex, 
the (3* coefficients do not involve the subscript g 
and therefore are the same for each sex. 

Nevertheless, this alternative formula (★) is 
equivalent to the interaction formula (♦) above. 
We show this by specifying the form that the 
model takes for g = 1 (females) and g = 2 
(males). 

Notice that the coefficients of log WBC are 
different in each formula, namely, ((3, 1 ' + (3)) for 
females versus |3[ for males. 

Similarly, the coefficients of Rx are different, 
namely, ((3 2 + (3 4 ) for females versus |3 2 for 
males. 

The preceding formulae indicate that two seem¬ 
ingly different formulae for the interaction 
model—(♦) versus (★), shown earlier—can be 
written in the same format. We show these 
formulae here separately for females and males. 
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EXAMPLE (continued) 



(♦) 

(*) 



Females (g 

= D: P.l -Pi 

+ p 3 





P 2 .= P2 

: + P4 





(♦) (* 

) 



Males 

cEl 

II 

<N 

II 






P 2 2=p2 



Stratified Cox regression 



Analysis time _t: survt 






Std. 

Haz. 

[95% 

Conf. 


Coef. 

Err. p > Izl 

Ratio 

Interval] 

log 

1.170 

0.499 0.019 

3.222 

1.213 

8.562 

WBC 






Rx 

0.267 

0.566 0.637 

1.306 

0.431 

3.959 

Sex 

0.469 

0.720 0.515 

1.598 

0.390 

6.549 

xlog 






WBC 






Sex 

L 592 

0.923 0.084 

4.915 

0.805 

30.003 

x Rx 







No. of subjects = 42 Log likelihood = -55.835 Stratified by sex 

Females: 


log WBC 


Rx 

Males: 


p u =EH?] 

p,‘ + fe* = 1.170 + 0.469 = |1,639| 
P 2 i=p59] 

p 2 ‘ + P 4 * = 0.267 + 1.592 a|L85?J 


log WBC p 12 =[L170j= p! 
Rx P 2 2 = :Q[?67]=p2 


Interaction model: 

h g (t,X ) = / 2 0 g (t)exp[pi log WBC + P^Rx 
+ p; (SEX x log WBC) 

+ P 4 (SEX x ifr)] 


Notice that for females, the coefficient |3 n in 
model (♦) must be equivalent to (|3* + P 3 ) in 
model (★) because both models have the same for¬ 
mat, and both |3 U and ((3* + |3j) are coefficients 
of the same variable, log WBC. Similarly, (3 2 i in 
model (♦) is equivalent to (|3 2 + P 4 ) in model (★) 
because both are coefficients of the same variable, 
Rx. 

For males, it follows in an analogous way, 
that the coefficient (3 12 is equivalent to (3,, and, 
similarly, |3 22 equals |3 2 . 

Here we provide computer results obtained 
from fitting the alternative interaction model (★). 
The estimated regression coefficients |3 2 , |3 3 , 
and |3 4 , respectively, are circled. 


We have shown above that the sums (3, + p! and 

A % A 5|c A A 

(3 2 + (3 4 are equal to the coefficients |3 n and (3 21 , 
respectively, in the original interaction model for 
females. 

Also, we have shown that (3* and |3 2 are equal 
to the coefficients |3 12 and [3 22 , respectively, in 
the original interaction model for the males. The 
numerical equivalences are shown here. Note 
again that the coefficients of log WBC and Rx 
for females are different from males, as is to be 
expected if sex interacts with each variable. 

We have thus seen that the interaction model 
can be written in a format that contains product 
terms involving the variable being stratified— 
SEX—being multiplied by each of the predictors 
not being stratified. We show this model involving 
product terms again here. We will use this 
model to describe a test of the no-interaction 
assumption. 
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EXAMPLE (continued) 


Testing the no-interaction assumption: 
LR = -2 In L r - (-2 In L F ) 

R = reduced (no-interaction) model 
F = full (interaction) model 


LR ~ X 2 d! under H 0 : no interaction 
(2 df because two product terms tested 
in interaction model) 


No interaction (reduced model): 
iOutput: -2 log L: 115.120] 

-2 In L r 


Interaction (full model): 

iOutput] -2 log L: 11T67Oj 
-2 In 


LR = 115.120- 111.670 = 3.45 
(P > 0.05 not significant). 

Thus, the no-interaction model is accep¬ 
table. 


The test is a likelihood ratio (LR) test which 
compares log-likelihood statistics for the interac¬ 
tion model and the no-interaction model. That 
is, the LR test statistic is of the form —2 In Lr 
minus —21nL f , where R denotes the reduced 
model, which in this case is the no-interaction 
model, and F denotes the full model, which is the 
interaction model. 

This LR test statistic has approximately a 
chi-square distribution with 2 degrees of freedom 
under the null hypothesis that the no-interaction 
model is correct. The degrees of freedom here is 2 
because there are two product terms being tested 
in the interaction model. 

The log-likelihood statistic for the reduced 
model comes from the computer output for the 
no-interaction model and is equal to —2 times 
-57.560, or 115.120. 

The log-likelihood statistic for the full model 
comes from the computer results for the interac¬ 
tion model and is equal to —2 times —55.835, or 

111.670. 

The LR statistic is therefore 115.120 minus 

111.670, which equals 3.45. This value is not sig¬ 
nificant at the 0.05 level for 2 degrees of freedom. 
Thus, it appears that despite the numerical dif¬ 
ference between corresponding coefficients in the 
female and male models, there is no statistically 
significant difference. We can therefore conclude 
for these data that the no-interaction model is 
acceptable (at least at the 0.05 level). 


Remission data example: 

• described no-interaction 
assumption 

• evaluated assumption using LR 
test 

• provided interaction model if 
needed 

Now, we generalize this process. 


Using the remission data example, we have 
described the no-interaction assumption, have 
shown how to evaluate this assumption using a 
likelihood ratio test, and have provided the form 
of an interaction model that should be used in case 
the no-interaction assumption does not hold. We 
now describe this process more generally for any 
stratified Cox analysis. 
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No-interaction SC model: 

h g (f,X) = /logCOexpCPjXj + |3 2 X 2 
+ • • • + |3 p X p ] 

g = 1 , 2 ,,k*, strata defined 
from Z* 


SC model allowing interaction: 

h g (t,X) = /i 0 g(Oexp[(3 lg Xi 

+ |3 2 g X 2 + • • • + (3 pg Xp] 
g = 1 , 2 ,..., k*, strata defined 
from Z* 


Alternative SC interaction model: 

• uses product terms involving Z * 

• define k* — 1 dummy variables 
Z\, Z$,..., Z* k ,_ v from Z* 

• products of the form Z * x X ; , 
where i = 1 ,..., k* — 1 and 


h g (t,X) = h 

ogh) exp[pjX, + • • • + p p X p 

+ Pn(Z* 

x X[) + 

■ ■ ■ + P p i(Z* x X p ) 

+ PnfZJ 

x X[) + 

■ ■ ■ + Pp2(Z| x X p ) 

+ ■■■ + £ 


m xTi) + - 

+ Pp.*:*-1 

X 

x p )l 

g = 1 , 2 ,.. 

,k*, strata defined from Z* 


Recall that the general form of the no-interaction 
model for the stratified Cox procedure is given as 
shown here. This model allows for several vari¬ 
ables being stratified through the use of a newly 
defined variable called Z*, whose strata consist of 
combinations of categories of the variables being 
stratified. 

If, in contrast, we allow for interaction of the Z* 
variable with the X's in the model, we can write 
the model as shown here. Notice that in this inter¬ 
action model, each regression coefficient has the 
subscript g, which denotes the gth stratum and 
indicates that the regression coefficients are dif¬ 
ferent for different strata of Z*. 

An alternative way to write the interaction model 
uses product terms involving the variable Z* 
with each of the predictors. However, to write 
this model correctly, we need to use k* — 1 
dummy variables to distinguish the k* categories 
of Z*; also, each of these dummy variables, 
which we denote as Z*, Z \,..., Z£,_j, needs to 
be involved in a product term with each of 
the X's. 

The hazard model formula alternative model is 
shown here. Notice that the first line of the for¬ 
mula contains the X’s by themselves, the next line 
contains products of each X, with Z*, the third 
line contains the products with Z*, and the last 
line contains products with Z/,. _ |. Note also that 
the subscript g occurs only with the baseline haz¬ 
ard function ho g (t), and is not explicitly used in 
the (3 coefficients. 
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EXAMPLE (Remission Data) 


Z* = sex, k* = 2, 

Zf = sex(0,l), 

X x = log WBC, X 2 = Rx(p = 2) 
h g (t,X) = ^(OexpfPjX, + p 2 X 2 
+ P 11 (Z* 1 x X x ) 

+ P 21 (Z^ x X 2 )] 

= /z 0g (f)exp[p)Iog WBC 
+ p|Rx + p^Csex x log WBC) 
+ P^sex x Rx)~\ 

g=T2 

Pi=Pi*. P2 ~ PI’ Pn - P3, and p 21 = pa 


In our previous example involving the remission 
data, the stratification variable (Z*) was the vari¬ 
able SEX, and k* was equal to 2; thus, we have 
only one dummy variable Z*, which uses a (0,1) 
coding to indicate sex, and we have only (p equal 
to) two predictors— X\ equal to log WBC and X 2 
equal to Rx. The interaction model is then written 
in either of the forms shown here. 

The latter version of the interaction model is what 
we previously presented for the remission data ex¬ 
ample. Because the two versions presented here 
are equivalent, it follows that |3i = (3,, |3 2 = P 2 , 
Pn = Pn an d P21 = p4- 


We have thus seen that the interaction model can 
be written in a format that contains product terms 
involving dummy variables (i.e., Zf) for the vari¬ 
able being stratified being multiplied by each of 
the predictors (i.e., X,) not being stratified. We 
will use this model to describe a test of the no¬ 
interaction assumption. 


Testing the no-interaction assump¬ 
tion: 

LR = -2\nL R -(-2\nL F ) 

R = reduced (no-interaction) model 
F = full (interaction) model 
contains product terms 

P11 = ''' = P p i = 0 
P 12 = • • • = |3 p2 = 0 
Ho'- . 

3u*-i = ■ ■ ■ = = 0 


The test is a likelihood ratio ( LR ) test which com¬ 
pares log likelihood statistics for the interaction 
model and the no-interaction model. That is, the 
LR test statistic is of the form —2 In L R minus 
—21nLp, where R denotes the reduced model, 
which in this case is the no-interaction model, and 
F denotes the full model, which is the interaction 
model. 

The no-interaction model differs from the inter¬ 
action model in that the latter contains additional 
product terms. Thus, one way to state the null hy¬ 
pothesis of no interaction is that the coefficients 
of each of these product terms are all zero. 


LR ~Xp(fc._i)df 

under H 0 : no interaction 


p(k* — 1) gives number of product 
terms being tested in interaction 
model 


The LR test statistic has approximately a chi- 
square distribution with p(k* — 1) degrees of free¬ 
dom under the null hypothesis. The degrees of 
freedom here is p(k* — 1) because this value gives 
the number of product terms that are being tested 
in the interaction model. 
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EXAMPLE (Remission Data) 


Z* = sex, k * = 2, 

Z\ = sex(0,l), 

Xj = log WBC, X 2 = Rx (p = 2) 
p{k* - 1) = 2, so 

LR ~ xidf under H 0 : no interaction 


Returning to the remission data example, for 
which p = 2 and k* = 2, the value of p(k* — 1) 
is equal to two times (2 — 1), which equals two. 
Thus, to test whether the SEX variable interacts 
with the log WBC and Rx predictors, the degrees 
of freedom for the LR statistic is two, as previously 
described. 


V. A Second Example 
Involving Several 
Stratification Variables 


EXAMPLE 


vets.dat: survival time in days, n = 137 

Veteran’s Administration Lung Cancer Trial 
Column 1: Treatment (standard = 1, test = 2) 
Column 2: Cell type 1 (large = 1, other = 0) 
Column 3: Cell type 2 (adeno = 1, other = 0) 
Column 4: Cell type 3 (small = 1, other = 0) 
Column 5: Cell type 4 (squamous = 1, other = 0} 
Column 6: Survival time (days) 

Column 7: Performance status (0 = worst,..., 
100 = best) 

Column 8: Disease duration (months) 

Column 9: Age 

Column 10: Prior therapy (none = 0, some =10) 
Column 11: Status (0 = censored, 1 = died) 


Cox regression 
Analysis time _t: survt 



Coef. 

Std. 

Err. 

Haz. [95% Conf. 
p > Izl Ratio Interval] . 

P(PH ) 

Treatment 

0.290 

0.207 

0.162 

1.336 

0.890 

2.006 

0.628 

Large cell 

0.400 

0.283 

0.157 

1.491 

0.857 

2.594 

0.033 

Adeno cell 

1.188 

0.301 

0.000 

3.281 

1.820 

5.915 

0.081 

Small cell 

0.856 

0.275 

0.002 

2.355 

1.374 

4.037 

0.078 

Perf. Stat 

-0.033 

0.006 

0.000 

0.968 

0.958 

0.978 

0.000 

Dis. Durat. 

0.000 

0.009 

0.992 

1.000 

0.982 

1.018 

0.919 

Age 

-0.009 

0.009 

0.358 

0.991 

0.974 

1.010 

0.198 

Pr. Therapy 

0.007 

0.023 

0.755 

1.007 

0.962 

1.054 

0.145 


No. of subjects =137 Log likelihood = -475.180 


Variables not satisfying PH: 

• cell type (3 dummy variables) 

• performance status 

• prior therapy (possibly) 

SC model: stratifies on cell type and per¬ 
formance status 


The dataset “vets.dat” considers survival times in 
days for 137 patients from the Veteran's Adminis¬ 
tration Lung Cancer Trial cited by Kalbfleisch and 
Prentice in their text (The Statistical Analysis of 
Survival Time Data, Wiley, pp. 223-224,1980). The 
exposure variable of interest is treatment status. 
Other variables of interest as control variables are 
cell type (four types, defined in terms of dummy 
variables), performance status, disease duration, 
age, and prior therapy status. Failure status is de¬ 
fined by the status variable. A complete list of the 
variables is shown here. 


Here we provide computer output obtained from 
fitting a Cox PH model to these data. Using the 
P(PH) information in the last column, we can see 
that at least four of the variables listed have P(PH) 
values below the 0.100 level. These four variables 
are labeled in the output as large cell (0.033), 
adeno cell (0.081), small cell (0.078), and Perf. Stat 
(0.000). Notice that the three variables, large cell, 
adeno cell, and small cell, are dummy variables 
that distinguish the four categories of cell type. 

Thus, it appears from the P(PH) results that the 
variables cell type (defined using dummy vari¬ 
ables) and performance status do not satisfy the 
PH assumption. 

Based on the conclusions just made about the PH 
assumption, we now describe a stratified Cox anal¬ 
ysis that stratifies on the variables, cell type and 
performance status. 
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EXAMPLE (continued) 


Z* given by combinations of categories: 

• cell type (four categories) 

• performance status (interval) change 
to 

• PSbin (two categories) 

Z* has k*= 4x2 = 8 categories 


Four other variables considered as X's: 

• treatment status 

• disease duration 

• age 

• prior therapy 


Here, we use treatment status and age 
asX’s 


Stratified Cox regression 
Analysis time _t: survt 

Std. Haz. [95% Conf. 

Coef. Err. p > Izl Ratio Interval] 

Treatment 0.125 0.208 (0.548)(l.l34) 0.753 1.706 

Age -0.001 0.010 0.897 0.999 0.979 1.019 

No. of subjects = 137 Log likelihood =-262.020 Stratified by Z 

No-interaction model 
HR = 1.134 (P = 0.548) 

Treatment effect (adjusted for age 
and Z*) is nonsignificant 


No-interaction model: 
h g (t,X) 

= /iQgfpexpfp! Treatment + P 2 Age] 
g = 1, 2 ,..8 (= # of strata 
defined from Z*) 

Interaction model: 
h g (t,X ) 

= /2 0g (t)exp[p lg Treatment + P 2g Age] 

g= 1, 2 ,..., 8 


Because we are stratifying on two variables, we 
need to form a single new categorical variable 
Z* whose categories represent combinations of 
categories of the two variables. The cell type 
variable has four categories by definition. The 
performance status variable, however, is an 
interval variable ranging between 0 for worst to 
100 for best, so it needs to be categorized. We 
categorize this variable into two groups using a 
cutpoint of 60, and we denote this binary variable 
as PSbin. Thus, the number of categories for our 
Z* variable is 4 x 2, or 8; that is, k* = 8. 

In addition to the two stratification variables, cell 
type and performance status, there are four other 
variables to be considered as predictors in the 
stratified Cox model. These are treatment status, 
disease duration, age, and prior therapy. 

For illustrative purposes here, we use only 
treatment status and age as predictors. The 
other two variables, disease duration and prior 
therapy, are considered in exercises following this 
presentation. 

Here we show computer output from fitting a 
stratified Cox model that stratifies on cell type 
and performance status using the eight-category 
stratification variable Z*. This model also in¬ 
cludes treatment and age as predictors. These 
results consider a no-interaction model, because 
only one regression coefficient is provided for 
the treatment and age predictors. Notice that the 
estimated hazard ratio is 1.134 for the effect of 
the treatment variable adjusted for age and Z*, 
the latter being adjusted by stratification. The 
p-value for this adjusted treatment effect is 0.548, 
which is highly nonsignificant. 

The no-interaction model we have just described 
has the hazard function formula shown here. 


To evaluate whether the no-interaction model 
is appropriate, we need to define an interaction 
model that allows different regression coeffi¬ 
cients for different strata. One way to write this 
interaction model is shown here. 
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EXAMPLE (continued) 


Alternative interaction model: 
h g (t,X) 

= ^ 0 g (t)exp[p] Treatment 
+ p 2 Age 

+ Pjj(Zj x Treatment) + • ■ ■ 

+ (S 17 (ZJ x Treatment) 

+ P 21 (Z[ x Age) + ■ ■ + (S 27 (Z 7 x Age)] 

g= 1 , 2 8 

Another version of interaction model: 
Replace Z * 7 by 
Zj = large cell (binary) 

Z* 2 = adeno cell (binary) 

Z * 3 = small cell (binary) 

Z\ = PSbin (binary) 

Z 5 = Z] x Z\ 

= ^2 x ^4 
Z ; — /. ; X Z 4 


h g (t,X) = ^ 0g (f)exp[p 1 Treatment + p 2 Age 
+ Pn(tr Z\) + P 12 (tr Z\ ) + Pj3(tr Z3) 

+ Pi4(tr ZJ) + Pisitr ZjZj) 

+ Pi 6 ( tr ZJZJ) + Pi 7 (tr Z\Z 4) 

+ P 2 ,(AGE Zj) + p 22 (AGE ZJ) 

+ P 23 (AGE Z\) + p 24 (AGE Zj) 

+ P 25 (AGE Z\Z\) + p 26 (AGE Z* 2 Z\) 

+ P 27 (AGE ZJZJ)] 


An alternative version of this interaction model 
that involves product terms is shown here. This 
version uses seven dummy variables denoted as 
Zj, Z 2 up through Z% to distinguish the eight cat¬ 
egories of the stratification variable Z*. The model 
contains the main effects of treatment and age 
plus interaction terms involving products of each 
of the seven dummy variables with each of the two 
predictors. 

Yet another version of the interaction model is to 
replace the seven dummy variables Z* to Z 7 by 
the seven variables listed here. These variables are 
three of the binary variables making up the cell 
type variable, the binary variable for performance 
status, plus three product terms involving each of 
the cell type dummy variables multiplied by the 
PSbin dummy variable {Z* A ). 

The latter interaction model is shown here. In this 
model, the variable tr Z\ denotes the product of 
treatment status with the large cell dummy Z j, the 
variable tr Z\ denotes the product of treatment 
status with the adeno cell variable Z\, and so on. 
Also, the variable tr Z*Z 4 denotes the triple prod¬ 
uct of treatment status times the large cell vari¬ 
able Zj times the PSbin variable Z\, and so on, 
for the other triple product terms involving treat¬ 
ment. Similarly, for the terms involving age, the 
variable Age Z j denotes the product of age with 
Z*, and the variable Age Z*Z* denotes the triple 
product of age times Z* times Z\. 


Note that we are just considering the interaction 
between the stratified variables and the predictors. 
We could also (but do not) consider the interaction 
between the two predictors, treatment, and age. 
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EXAMPLE (continued) 


Stratified Cox Regression Analysis on 
Variable: Z* 

Response: Surv. Time 



Coef. 

Std. 

Err. 

p > Izl 

Haz. 

Ratio 

[95% Conf. 
Interval] 

Treatment 

0.286 

0.664 

0.667 

1.331 

0.362 

4.893 

Age 

0.000 

0.030 

0.978 

0.999 

0.942 

1.060 

tr Zj 

2.351 

1.772 

0.184 

10.495 

0.326 

337.989 

tr Z *2 

-1.158 

0.957 

0.226 

0.314 

0.048 

2.047 

tr Zj 

0.582 

0.855 

0.496 

1.790 

0.335 

9.562 

tr Z\ 

-1.033 

0.868 

0.234 

0.356 

0.065 

1.950 

tr ZjZ 4 

-0.794 

1.980 

0.688 

0.452 

0.009 

21.882 

tr Z 2 Z 4 

2.785 

1.316 

0.034 

16.204 

1.229 

213.589 

tr Z 3 Z 4 

0.462 

1.130 

0.683 

1.587 

0.173 

14.534 

Age Zj 

0.078 

0.064 

0.223 

1.081 

0.954 

1.225 

AgeZj 

-0.047 

0.045 

0.295 

0.954 

0.873 

1.042 

Age Zj 

-0.059 

0.042 

0.162 

0.943 

0.868 

1.024 

Ag cZ" 4 

0.051 

0.048 

0.287 

1.053 

0.958 

1.157 

Age ZjZj 

-0.167 

0.082 

0.042 

0.847 

0.721 

0.994 

Age^ 

-0.045 

0.068 

0.511 

0.956 

0.838 

1.092 

Age ZjZj 

0.041 

0.061 

0.499 

1.042 

0.924 

1.175 


No. of subjects =137 Log likelihood = -249.972 Stratified by Z* 


Eight possible combinations of Zj to Z\: 

g=l: Z! = Z| = Z5 = Z*4=0 
g = 2: Z 1= 1,Z 2 = Z 3 = Z 4 =0 

g = 3: Z 2 = 1, Zj = Z 3 = Z 4 = 0 
g = 4: Z 3 = 1, Zj= Z 2 = Z 4 = 0 
g = 5: Z[ = Z 2 = Zj = 0, Z 4 = 1 
g = 6: Z[ = 1, Z 2 = Zj = 0, Z 4 =l 
g = 7: Z 2 = 1, Z j = Z 3 = 0, Z 4 = 1 

g = 8: z* 3 =i,z; = zj=o, ZJ=1 

g=l: Z[ = Z 2 = Z 3 = Z 4 =0 
(Squamous cell type and PSbin = 0) 

All product terms are zero: 
hj(t.X) 

= feo^OexpfPjTreatment + p 2 Age], 
where ft = 0.286, 
ft = 0.000, so that 

/!](?,X) = h 01 (t)exp[(0.286)Treatment] 


g = 2: z;= i,z;=z; = z;=o 

(Large cell type and PSbin = 0) 
Nonzero product terms Coefficients 
Age Z\ = Age P 21 

tr Zj = Treatment P n 


Here we provide the computer results from fitting 
the interaction model just described. Notice that 
the first two variables listed are the main effects 
of treatment status and age. The next seven vari¬ 
ables are product terms involving the interaction 
of treatment status with the seven categories of Z *. 
The final seven variables are product terms involv¬ 
ing the interaction of age with the seven categories 
of Z*. As defined on the previous page, the seven 
variables used to define Z* consist of three dummy 
variables Z j, Z j and Zj for cell type, a binary vari¬ 
able ZJ for performance status and products of Z J 
with each of Zj, Zj, and Zj. Note that once the 
variables Z j, Zj, Zj, and Zj are specified, the val¬ 
ues of the three product terms are automatically 
determined. 

We can use these results to show that the inter¬ 
action model being fit yields different regression 
coefficients for each of the eight categories defined 
by the subscript g for the stratification variable Z *. 
These eight categories represent the possible com¬ 
binations of the four variables Z j to Zj, as shown 
here. 


Consider the hazard function when the variables 
Z\ through Zj are all equal to zero. This stratum 
is defined by the combination of squamous cell 
type and a binary performance status value of 0. 
In this case, all product terms are equal to zero 
and the hazard model contains only the main ef¬ 
fect terms treatment and age. The estimated haz¬ 
ard function for this stratum uses the coefficients 
0.286 for treatment and 0.000 for age, yielding the 
expression shown here. Note that age drops out 
of the expression because its coefficient is zero to 
three decimal places. 


Now consider the hazard function when the vari¬ 
able Zj equals 1 and Zj through Zj are equal to 
zero. This stratum is defined by the combination 
of large cell type and a PSbin value of 0. In this 
case, the only nonzero product terms are Age Z j 
and tr Zj, whose coefficients are (3 21 and (3 n , re¬ 
spectively. 
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EXAMPLE (continued) 


h 2 (t,X ) = /i 02 (f)exp[(p 1 + P n )Treatment 

+ (P2 + P21) Age] 

Pi = 0.286, p 2 = 0.000 
p n = 2.351, p 21 = 0.078 

Hazard functions for interaction model: 
g=l:(Z] = Z * 2 = Z * 3 = Zj = 0): 

h t (t, X) = // f )i f/)exp[ (0.286)Treatment] 
g = 2: (Z*i = 1, Z 2 = Z 3 = Z 4 = 0): 

h 2 (t,X) = /io2(Oexp[(2.637)Treatment 
+ (0.078)Age] 

g = 3: (Z|= 1, 7\=Z\ = ZJ= 0): 

h 2 (t,X) = h m (t)cxp\ (-0.872 )T reatment 
+ (-0.047)Age] 

g = 4: (Z 3 = 1, Z\= Z 2 = Z* 4 = 0): 

fi 4 (t,X) = /i 0 4(t)exp[(0.868)Treatment 
+ (-0.059)Age] 

g = 5: (Z\ = Z 2 = Z 3 = 0, Z\ = 1): 

ti 5 (r,X) = /z 0 5(Oexp[(-0.747)Treatment 
+ (0.051)Age] 

g = 6 :(Z* 1 =l,Z) = Z^ = 0,Z* 4 =l): 

h 6 (t,X ) = lj 06 (t)exp[(0.810)Treatment 
+ (-0.038)Age] 

g = 7: (Z 2 = 1, Zj = Z 3 = 0, Z 4 = 1): 

h 7 (t,X ) = /! (17 (/) e x p | (0.880) T re a L m e n I. 

+ (-0.041 )Age] 

g = 8 :(Z^=l,Zj = Z* 2 =0, Z 4 = 1): 

h%{t,X) = /?o8( A )exp| (0.297)T reatmenl. 

+ (0.033)Age] 

LR test to compare no-interaction model 
with interaction model: 


H 0 : no-interaction model acceptable, i.e., 
Treatment: P u = p 12 = •• • = P 17 = 0 
andAge: P 2! = P 22 = ■ | = P 27 = 0 


14 coefficients => df = 14 

LR = -2 In L r - (2 In L F ) 

R = reduced (no-interaction) model 
F = full (interaction) model 


The hazard function for this second stratum is 
shown here. Notice that the coefficients of the 
treatment and age variables are (pi + |3 n ) and 
(|3 2 + (3 2 i), respectively. The estimated values of 
each of these coefficients are given here. 

The corresponding estimated hazard function for 
the second stratum (i.e., g = 2) is shown here. For 
comparison, we repeat the estimated hazard func¬ 
tion for the first stratum. 

The estimated hazard functions for the remain¬ 
ing strata are provided here. We leave it up to the 
reader to verify these formulae. Notice that the co¬ 
efficients of treatment are all different in the eight 
strata, and the coefficients of age also are all dif¬ 
ferent in the eight strata. 


We have presented computer results for both the 
no-interaction and the interaction models. To eval¬ 
uate whether the no-interaction assumption is sat¬ 
isfied, we need to carry out a likelihood ratio test 
to compare these two models. 

The null hypothesis being tested is that the no¬ 
interaction model is acceptable. Equivalently, this 
null hypothesis can be stated by setting the co¬ 
efficients of all product terms in the interaction 
model to zero. That is, the seven coefficients of 
product terms involving treatment and the seven 
coefficients of the product terms involving age are 
set equal to zero as shown here. 

Because the null hypothesis involves 14 coeffi¬ 
cients, the degrees of freedom of the LR chi- 
square statistic is 14. The test statistic takes the 
usual form involving the difference between log- 
likelihood statistics for the reduced and full mod¬ 
els, where the reduced model is the no-interaction 
model and the full model is the interaction model. 
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EXAMPLE (continued) 


LR~X\ 4 df under H 0 : no interaction 

LR = (-2 x -262.020) - (-2 x -249.972) 
= 524.040 - 499.944 = 24.096 
P= 0.045 (significant at 0.05) 
Conclusion: 

Reject Hq. interaction model is 
preferred. 


Might use further testing to simplify 
interaction model, e.g., test for seven 
products involving treatment or test for 
seven products involving age. 


Thus, under the null hypothesis, the LR statistic 
is approximately chi-square with 14 degrees of 
freedom. 

The computer results for the no-interaction and 
interaction models give log-likelihood values of 
524.040 and 499.944, respectively. The difference 
is 24.096. A chi-square value of 24.096 with 14 de¬ 
grees of freedom yields a p-value of 0.045, so that 
the test gives a significant result at the 0.05 level. 
This indicates that the no-interaction model is not 
acceptable and the interaction model is preferred. 

Note, however, that it may be possible from fur¬ 
ther statistical testing to simplify the interaction 
model to have fewer than 14 product terms. For 
example, one might test for only the seven prod¬ 
uct terms involving treatment or only the seven 
product terms involving age. 


VI. A Graphical View of the 
Stratified Cox Approach 

a. h(t) = ho(t)exp([3jRX 
+ (3 2 SEX) 

ln(— In S(t)) = ln(- In So(t)) 

+ PjRX + |3 2 SEX 


males, RX = 1 



In this section we examine four log-log survival 
plots illustrating the assumptions underlying a 
stratified Cox model with or without interaction. 
Each of the four models considers two dichoto¬ 
mous predictors: treatment (coded RX = 1 for 
placebo and RX = 0 for new treatment) and SEX 
(coded 0 for females and 1 for males). The four 
models are as follows (see left). 

a. ho(t)exp((3!RX + (3 2 SEX). This model 
assumes the PH assumption for both RX 
and SEX and also assumes no interaction 
between RX and SEX. Notice all four 
log-log curves are parallel (PH assumption) 
and the effect of treatment is the same for 
females and males (no interaction). The 
effect of treatment (controlling for SEX) 
can be interpreted as the distance between 
the log-log curves from RX = 1 to RX = 0, 
for males and for females, separately. 
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b. h(t) = h 0 (t)exp((3 1 RX+ (3 2 SEX 
+ |3 3 RXx SEX) 
ln(—lnS(t)) = ln(-lnSo(t)) 

+ (3jRX+ (3 2 SEX+(3 3 RX x SEX 



c. h(t) = h 0g (t)exp((3 1 RX) 

(g = 1 for males, g = 0 for 
females) 

ln(—lnS(t)) = ln(— In So g (t)) 

+ pjRX 



d. h(t) = hog(t)exp((3[RX 
+ |3 2 RX x SEX) 

(g = 1 for males, g = 0 for 
females) 

ln(—lnS(t)) = ln(—lnSog(t)) 

+ (3jRX + |3 2 RX xSEX 



b. h(t) = ho(t)exp(|3 1 RX+ |3 2 SEX+ (3 3 
RX x SEX). This model assumes the PH 
assumption for both RX and SEX and 
allows for interaction between these two 
variables. All four log-log curves are 
parallel (PH assumption) but the effect of 
treatment is larger for males than females 
as the distance from RX = 1 to RX = 0 is 
greater for males. 


c. h(t) = hogftfexpff^RX), where g = 1 for 
males, g = 0 for females. This is a stratified 
Cox model in which the PH assumption is 
not assumed for SEX. Notice the curves for 
males and females are not parallel. 
However, the curves for RX are parallel 
within each stratum of SEX indicating that 
the PH assumption is satisfied for RX. The 
distance between the log-log curves from 
RX = 1 to RX = 0 is the same for males 
and females indicating no interaction 
between RX and SEX. 


d. h(t) = hog(t) exp^RX + (3 2 RX x SEX), 
where g = 1 for males, g = 0 for females. 
This is a stratified Cox model allowing for 
interaction of RX and SEX. The curves for 
males and females are not parallel 
although the PH assumption is satisfied for 
RX within each stratum of SEX. The 
distance between the log-log curves from 
RX = 1 to RX = 0 is greater for males than 
females indicating interaction between RX 
and SEX. 


t 
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VII. Summary 

Stratified Cox (SC) model: 


We now summarize the most important features 
of the stratified Cox (SC) model described in this 
presentation. 


• stratification of predictors not 
satisfying PH assumption 

• includes predictors satisfying 

PH 

• does not include stratified 
variables 


The SC model is a modification of the Cox PH 
model to allow for control by “stratification” of 
predictors not satisfying the PH assumption. Vari¬ 
ables that are assumed to satisfy the assumption 
are included in the model as predictors; the strat¬ 
ified variables are not included in the model. 


Computer Results 

Stratified Cox regression 
Analysis time _t: survt 

Std. Haz. [95% Conf. 

Coef. Err. p > |z| Ratio Interval] 
log 

WBC 1.390 0.338 0.000 4.016 2.072 7.783 
RX 0.931 0.472 0.048 2.537 1.006 6.396 
No. of Log likelihood Stratified 

subjects = 42 = —57.560 by sex 


The computer results for a SC model provides 
essentially the same type of output as provided 
for a Cox PH model without stratification. An ex¬ 
ample of SC output using the remission data is 
shown here. The variables included as predictors 
in the model are listed in the first column followed 
by their estimated coefficients, standard errors, 
p-values, hazard ratio values, and 95% confidence 
limits. Such information cannot be provided for 
the variables being stratified, because these lat¬ 
ter variables are not explicitly included in the 
model. 


Hazard function for stratified Cox 
model: 

h g (t,X) = /zog(t)exp[(3 1 Xi + |3 2 Z 2 
+ • • • + (3 p Xp\ 

g = 1,2,..., k*, strata defined 
from Z* 


The general hazard function form for the stratified 
Cox model is shown here. This formula contains 
a subscript g that indicates the gth stratum, where 
the strata are different categories of the stratifica¬ 
tion variable Z* and the number of strata equals 
k*. Notice that the baseline hazard functions are 
different in each stratum. 


Z* has k* categories 
X u X 2 ,...,X p satisfy PH 


Stratification variable Z*: 


• identify Z \, Z 2 ,..., Z& not 
satisfying PH 

• categorize each Z 

• form combinations of categories 
(strata) 

• each combination is a stratum 
of Z* 


The variable Z* is defined by first identifying the 
Z, variables not satisfying the PH assumption. We 
then categorize each Z and form combinations of 
categories of each of the Z's. Each combination 
represents a different stratum making up the vari¬ 
able Z*. 
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No-interaction model: 

Same coefficients (3 X , (3 2 ,..., (3 p 
for each g, i.e., Z* does not interact 
with the A's. 


Different 

baselines 


hi(t) =>■ Si(t) 
^02(t) => §2(t) 


hok(t) 


Different 

survival 

curves 


HR same for each stratum 


(Partial) likelihood function: 
L = L\ x Lj x • • • x Ljt* 


Stratified Cox model allowing interaction: 


h g (t,X) = h 0g (t)exp[f?> le X l + P 2g A 2 
+ " • + fi pe X p ] 

g = 1, 2,..., k*, strata defined from Z*. 


Alternative stratified Cox interac¬ 
tion model: 

• uses product terms involving Z * 

• define k* — 1 dummy variables 
from Z* 

• products of the form Z* x Xj 

Testing the no-interaction assump¬ 
tion: 

LR = -2\nL R -{2 In L F ) 

R = reduced (no-interaction) model 
F = full (interaction) model 
contains product terms 
LR ~X 2 p {k*- i )d f under H o- no 
interaction 


The above model is designated as a “no¬ 
interaction” model because the (3 s in the model 
are the same for each subscript g. The no¬ 
interaction assumption means that the variables 
being stratified are assumed not to interact with 
the As in the model. 

For the no-interaction model, the fitted SC model 
will yield different estimated survival curves for 
each stratum because the baseline hazard func¬ 
tions are different for each stratum. 

However, because the coefficients of the As are the 
same for each stratum, estimates of hazard ratios 
are the same for each stratum. 

Regression coefficients in the SC model are esti¬ 
mated by maximizing a partial likelihood function 
that is obtained by multiplying likelihood func¬ 
tions for each stratum. 

In order to evaluate the no-interaction assump¬ 
tion, we must define an interaction model for com¬ 
parison. One version of the interaction model is 
shown here. This version shows regression coeffi¬ 
cients with different subscripts in different strata; 
that is, each (3 coefficient has a subscript g. 

An alternative way to write the interaction model 
uses product terms involving the Z* variable with 
each predictor. This model uses k *—1 dummy vari¬ 
ables to distinguish the k* categories of Z*. Each 
of these dummy variables is included as a product 
term with each of the As. 


To evaluate the no-interaction assumption, we can 
perform a likelihood ratio test that compares the 
(reduced) no-interaction model to the (full) inter¬ 
action model. The null hypothesis is that the no¬ 
interaction assumption is satisfied. The test statis¬ 
tic is given by the difference between the log- 
likelihood statistics for the no-interaction and in¬ 
teraction models. This statistic is approximately 
chi-square under the null hypothesis. The degrees 
of freedom is p(k*— 1) where p denotes the num¬ 
ber of As and k* is the number of categories mak¬ 
ing up Z*. 
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PRESENTATION COMPLETE! 


Chapters 

1. Introduction to Survival 
Analysis 

2. Kaplan-Meier Survival Curves 
and the Log-Rank Test 

3. The Cox Proportional Hazards 
Model and Its Characteristics 

4. Evaluating the Proportional 
Hazards Assumption 

.y/5. [The Stratified Cox Procedure] 


Next: 


This presentation is now complete. We suggest 
that the reader review this presentation using the 
detailed outline that follows. Then answer the 
practice exercises and the test that follow. 

The next Chapter (6) is entitled “Extension of the 
Cox PH Model for Time-Dependent Variables.” 
There we show how an “extended” Cox model 
can be used as an alternative to the stratified Cox 
model when one or more predictors do not satisfy 
the PH assumption. We also discuss more gener¬ 
ally what is a time-dependent variable, and show 
how such a variable can be evaluated using an ex¬ 
tended Cox model. 


6. Extension of the Cox 

Proportional Hazards Model for 
Time-Dependent Variables 
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Detailed 

Outline 


I. Preview (page 176) 

A. Focus on how stratified Cox (SC) procedure is 
carried out: 

• analysis of computer results from SC 
procedure; 

• hazard function for SC model; 

• stratifying on a single predictor versus two or 
more predictors; 

• no-interaction versus interaction models. 


II. An Example (pages 176-180) 

A. Cox PH results for remission data yield 
P(PH) = 0.031 for SEX. 

B. SC model used: control for SEX (stratified); 
include log WBC and Rx in model. 

C. Analysis of Rx effect from stratified Cox 
results: 

HR = 2.537; 95% Cl: (1.006,6.396); LR and 
Wald tests: P < 0.05. 


D. Hazard model: h g (t, X) = 

/z 0g (f) exp[|3j log WBC + |3 2 Rx],g = 1,2 

• different baseline hazard functions and 
survival curves for females and males; 

• same coefficients (3, and |3 2 f° r both females 
and males (no-interaction assumption); 

• obtain estimates by maximizing partial 
likelihood L = L\ x L 2 . 

E. Graph of four adjusted survival curves for Rx 

(adjusted for log WBC). 

III. The General Stratified Cox (SC) Model 

(pages 180-181) 


A. 


h g (t ,X) — ho g (t)exp [(3iXi + (3 2 X 2 + • • • + (3 p X p ], 
g = l, 2, 

where the strata are defined from the stratification 
variable Z*. 


B. Z* defined from Z \, Z 2 , ..., Zjt variables that do 
not satisfy PH: 

• categorize each Z, 

• form combinations of categories 

• each combination is a stratum of Z* 


C. Different baseline hazard functions and survival 
curves for each stratum. 
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D. Assumes no interaction: same coefficients 

(3!, |3 2 ,..., (3 p for each g; i.e., Z* does not interact 
with the X's; i.e., estimated HR is same for each 
stratum. 

E. Obtain estimates by maximizing partial likelihood 
L = L\ x L 2 x • • • x Lk*, where Li is likelihood 
for i th stratum. 

IV. The No-Interaction Assumption and How to Test It 

(pages 182-188) 

A. Assumes same coefficients |3[, |3 2 ,..., (3 p for each 
g- 

B. Interaction model: 

h s {t, X) = /zogfOexplPigX] + p2gX 2 + • • • + |3 pg X p ] , 

g = 1,2,... ,k* strata defined from Z*. 

C. Alternative stratified Cox interaction model: 

• uses product terms involving Z* 

• define k *—1 dummy variables 
Zj, Z 2 , ■.., Z k *-\* from Z* 

• products of the form Z* x Xj , where 

i = 1,... ,k* - 1; / = 1. p 

• hazard function: g = 1,2 ,... ,k* strata 
defined from Z* 

/ig(f.X) = /zo g d)exp[|3iXi + • • ■ + fipXp + |3n(Zj x Xi) 

+ ■ ■ ■ + |3 p i(Zj x Xp) + /?i 2 (ZJ x Xi) + • • • + fypiLLj x Xp) 

+ ■ ■ ■ + Pi.fc*—i(Z^,_] x Xi) + ■ ■ ■ + P Pj jt»_i(ZJ,_] x X p )] 

D. Testing the no-interaction assumption: use LR 
statistic given by LR = —2 In L R — (—2 In L F ) 
where R = reduced (no interaction) model and 
F = full (interaction) model 
LR~x p( j.„_j) df under Hq. no interaction, 

i.e., (3 n = |3 21 = ... = = 0 

V. A Second Example Involving Several Stratification 
Variables (pages 188-193) 

A. Dataset “vets.dat” from Veteran’s Administration 
Lung Cancer Trial; n = 137; survival time in days. 

B. Variables are: treatment status, cell type (four 
types), performance status, disease duration, age, 
and prior therapy status. 

C. Cox PH results indicate [using P(PH )] that cell 
type and performance status do not satisfy PH 
assumption. 
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D. Example stratifies on cell type and performance 
status using four categories of cell type and two 
categories of performance status, so that Z* has 
k* = 8 strata. 

E. X’s considered in model are treatment status and 
age. 

F. Computer results for no-interaction model: 
estimated HR for effect of treatment adjusted for 
age and Z* is 1.134 (P = 0.548); not significant. 

G. Hazard function for no-interaction model: 
h g (t,X) = /2o g (t)exp[|3 1 Treatment + (3 2 Age], 
g = 1,2,...,8 

H. Hazard function for interaction model: 
h g (t,X) = ho g (f)exp[|3 lg Treatment + |3 2g Age], 
g = 1,2,...,8 

I. Alternative version of interaction model: 
h g (t,X) = /zog(t)exp[|3j Treatment + (3 2 Age 

+ |3 n (Z* x Treatment) + • • • + (3 17 (Z 7 x Treatment) 

+ |3 21 (Z* x Age) 4-b (3 27 (Z| x Age)], 

g = 1,2,...,8 

where Z* = large cell (binary), Z 2 = adeno cell 
(binary), Z j = small cell (binary), Z\ = PSbin 
(binary), Z\ = Z\ x Z*, Z* = Z* x Z*, 

7 * _ 7* w 7* 

a? — z. 3 x A 4 

J. Demonstration that alternative interaction version 
(in item I) is equivalent to original interaction 
formulation (in item H) using computer results for 
the alternative version. 

K. Test of no-interaction assumption: 

• null hypothesis: (3 n = (3 12 = ... = |3 17 = 0 
and |3 21 = |3 22 = ... = (3 27 = 0 

• LP~Xj 2 4df under Ho: no interaction 

• LR = 524.040 - 499.944 = 24.096 
(P = 0.045) 

Conclusion: Reject null hypothesis; 
interaction model is preferred. 

VI. A Graphical View of the Stratified Cox Approach 

(pages 193-194) 

Comparison of log-log survival curves 

I. Describe interaction of Rx and Sex. 

2. Describe violation of PH assumption for Sex. 

VII. Summary (pages 195-196) 
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Practice 

Exercises 


Cox regression 
Analysis time t: 


survt 

Coef. 

Std. Err. 

P > [z| 

Haz. Ratio [95% Conf. Interval] 

P(PH) 

Treatment 

0.290 

0.207 

0.162 

1.336 

0.890 

2.006 

0.628 

Large cell 

0.400 

0.283 

0.157 

1.491 

0.857 

2.594 

0.033 

Adeno cell 

1.188 

0.301 

0.000 

3.281 

1.820 

5.915 

0.081 

Small cell 

0.856 

0.275 

0.002 

2.355 

1.374 

4.037 

0.078 

Perf.Stat 

-0.033 

0.006 

0.000 

0.968 

0.958 

0.978 

0.000 

Dis.Durat. 

0.000 

0.009 

0.992 

1.000 

0.982 

1.018 

0.919 

Age 

-0.009 

0.009 

0.358 

0.991 

0.974 

1.010 

0.198 

Pr.Therapy 

0.007 

0.023 

0.755 

1.007 

0.962 

1.054 

0.145 

No. of subjects = 

137 

Log likelihood = 

-475.180 




Cox regression 
Analysis time t: 
survt 

Coef. 

Std. Err. 

P > |z| 

Haz. Ratio [95% Conf. Interval] 

P{PH) 

Treatment 

0.298 

0.197 

0.130 

1.347 

0.916 

1.981 

0.739 

Small cell 

0.392 

0.210 

0.062 

1.481 

0.981 

2.235 

0.382 

Perf.Stat 

-0.033 

0.005 

0.000 

0.968 

0.958 

0.978 

0.000 

Dis.Durat. 

-0.001 

0.009 

0.887 

0.999 

0.981 

1.017 

0.926 

Age 

-0.006 

0.009 

0.511 

0.994 

0.976 

1.012 

0.211 

Pr.Therapy 

-0.003 

0.023 

0.884 

0.997 

0.954 

1.042 

0.146 


No. of subjects =137 Log likelihood = —487.770 


How do the printouts differ in terms of what the P(PH) 
information says about which variables do not satisfy 
the PH assumption? 

2. Based on the above information, if you were going to 
stratify on the cell type variable, how would you define 
the strata? Explain. 


The following questions derive from the dataset vets.dat con¬ 
cerning the Veteran's Administration Lung Cancer Trial that 
we previously considered in the presentation on the stratified 
Cox model. Recall that survival times are in days and that 
the study size contains 137 patients. The exposure variable 
of interest is treatment status (standard = 1, test = 2). Other 
variables of interest as control variables are cell type (four 
types, defined in terms of dummy variables), performance 
status, disease duration, age, and prior therapy status. Fail¬ 
ure status is defined by the status variable (0 = censored, 
1 = died). 

1. Consider the following two edited printouts obtained 
from fitting a Cox PH model to these data. 
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3. Consider a stratified analysis that stratifies on the vari¬ 
ables Z\ = “small cell” and Zi = “performance status.” 
The small cell variable is one of the dummy variables for 
cell type defined above. The performance status variable 
is dichotomized into high (60 or above) and low (below 
60) and is denoted as PSbin. The stratification variable 
which combines categories from Z i and Z 2 is denoted as 
SZ* and consists of four categories. The predictors in¬ 
cluded (but not stratified) in the analysis are treatment 
status, disease duration, age, and prior therapy. The com¬ 
puter results are as follows: 

Stratified Cox 
regression 
Analysis time _t: 


survt 

Coef. 

Std. Err. 

P> [z| 

Haz. Ratio [95% Conf. Interval] 

Treatment 

0.090 

0.197 

0.647 

1.095 

0.744 

1.611 

Dis.Durat. 

0.000 

0.010 

0.964 

1.000 

0.982 

1.019 

Age 

0.002 

0.010 

0.873 

1.002 

0.983 

1.021 

Pr.Therapy 

-0.010 

0.023 

0.656 

0.990 

0.947 

1.035 

No. of subjects = 

137 

Log likelihood = 

-344.848 

Stratified by SZ* 


Based on these results, describe the point and interval 
estimates for the hazard ratio for the treatment effect ad¬ 
justed for the other variables, including SZ*. Is this haz¬ 
ard ratio meaningfully and/or statistically significant? 
Explain. 

4. State the form of the hazard function for the model being 
fit in question 3. Why does this model assume no interac¬ 
tion between the stratified variables and the predictors 
in the model? 

5. State two alternative ways to write the hazard function 
for an “interaction model” that allows for the interac¬ 
tion of the stratified variables with the treatment status 
variable, but assumes no other type of interaction. 

6. State two alternative versions of the hazard function for 
an interaction model that allows for the interaction of 
the stratified variables (small cell and performance sta¬ 
tus) with each of the predictors treatment status, disease 
duration, age, and prior therapy. 

7. For the interaction model described in question 6, what 
is the formula for the hazard ratio for the effect of treat¬ 
ment adjusted for the other variables? Does this formula 
give a different hazard ratio for different strata? Explain. 
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8. State two alternative versions of the null hypothesis for 
testing whether the no-interaction assumption is satis¬ 
fied for the stratified Cox model. Note that one of these 
versions should involve a set of regression coefficients 
being set equal to zero. 

9. State the form of the likelihood ratio statistic for evaluat¬ 
ing the no-interaction assumption. How is this statistic 
distributed under the null hypothesis, and with what de¬ 
grees of freedom? 

10. Provided below are computer results for fitting the in¬ 
teraction model described in question 6. In this print¬ 
out the variable Z * denotes the small cell variable and 
the variable Z* 2 denotes the PSbin variable. The variable 
DDZj denotes the product of Z* with disease duration, 
and other product terms are defined similarly. 

Stratified Cox 
regression 
Analysis time t: 


survt 

Coef. 

Std. Err. 

P > |z| 

Haz. Ratio 

[95% Conf. Interval] 

Treatment 

0.381 

0.428 

0.374 

1.464 

0.632 

3.389 

Dis.Durat. 

0.015 

0.021 

0.469 

1.015 

0.975 

1.057 

Age 

0.000 

0.017 

0.994 

1.000 

0.968 

1.033 

Pr.Therapy 

0.023 

0.041 

0.571 

1.023 

0.944 

1.109 

DDZ* 

-0.029 

0.024 

0.234 

0.971 

0.926 

1.019 

AgeZj 

-0.055 

0.037 

0.135 

0.946 

0.880 

1.018 

PTZ* 

0.043 

0.075 

0.564 

1.044 

0.901 

1.211 

DDZ* 

0.025 

0.032 

0.425 

1.026 

0.964 

1.092 

AgeZ* 

0.001 

0.024 

0.956 

1.001 

0.956 

1.049 

PTZ* 

-0.078 

0.054 

0.152 

0.925 

0.831 

1.029 

DDZjZ* 

-0.071 

0.059 

0.225 

0.931 

0.830 

1.045 

AgeZjZ* 

0.084 

0.049 

0.084 

1.088 

0.989 

1.196 

PTZjZ* 

-0.005 

0.117 

0.963 

0.995 

0.791 

1.250 

trZ* 

0.560 

0.732 

0.444 

1.751 

0.417 

7.351 

trZ* 

-0.591 

0.523 

0.258 

0.554 

0.199 

1.543 

trZiZJ 

-0.324 

0.942 

0.731 

0.723 

0.114 

4.583 

No. of subjects = 

137 

Log likelihood = - 

-335.591 

Stratified by SZ* 


Use the above computer results to state the form of the 
estimated hazard model for each of the four strata of the 
stratification variable SZ*. Also, for each strata, compute 
the hazard ratio for the treatment effect adjusted for dis¬ 
ease duration, age, and prior therapy. 
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Test 


11. Carry out the likelihood ratio test to evaluate the no¬ 
interaction model described in question 4. In carrying 
out this test, make sure to state the null hypothesis in 
terms of regression coefficients being set equal to zero in 
the interaction model fitted in question 10. Also, deter¬ 
mine the p-value for this test and state your conclusions 
about significance as well as which model you prefer, the 
no-interaction model or the interaction model. 

12. The adjusted log-log survival curves for each of the four 
strata defined by the stratification variable SZ* (adjusted 
for treatment status, disease duration, age, and prior 
therapy) are presented below. 



Using this graph, what can you conclude about whether 
the PH assumption is satisfied for the variables, small 
cell type and PSbin? 

13. Comment on what you think can be learned by graphing 
adjusted survival curves that compare the two treatment 
groups for each of the four strata of SZ*. 


The following questions consider a dataset from a study 
by Caplehorn et al. (“Methadone Dosage and Retention of 
Patients in Maintenance Treatment,” Med. J. Aust., 1991). 
These data comprise the times in days spent by heroin addicts 
from entry to departure from one of two methadone clinics. 
Two other covariates, namely, prison record and maximum 
methadone dose, are believed to affect the survival times. The 
dataset name is addicts.dat. A listing of the variables is given 
below: 

Column 1: Subject ID 

Column 2: Clinic (1 or 2) 
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Column 3: Survival status (0 = censored, 1 = departed 
from clinic) 

Column 4: Survival time in days 

Column 5: Prison record (0 = none, 1 = any) 

Column 6: Maximum methadone dose (mg/day) 

1. The following edited printout was obtained from fitting a 
Cox PH model to these data: 


Cox regression 
Analysis time _t: 


survt 

Coef. 

Std. Err. 

P > |z| 

Haz. Ratio [95% Conf. Interval] P(PH) 

clinic 

-1.009 

0.215 

0.000 

0.365 

0.239 

0.556 

0.001 

prison 

0.327 

0.167 

0.051 

1.386 

0.999 

1.924 

0.332 

dose 

-0.035 

0.006 

0.000 

0.965 

0.953 

0.977 

0.341 


No. of subjects = 238 Log likelihood = -673.403 

Based on the P(PH) information in the above printout, it 
appears that clinic does not satisfy the PH assumption; 
this conclusion is also supported by comparing log-log 
curves for the two clinics and noticing strong nonparal¬ 
lelism. What might we learn from fitting a stratified Cox 
(SC) model stratifying on the clinic variable? What is a 
drawback to using a SC procedure that stratifies on the 
clinic variable? 

2. The following printout was obtained from fitting a SC PH 
model to these data, where the variable being stratified is 
clinic: 

Stratified Cox 
regression 
Analysis time _t: 

survt Coef. Std. Err. p > |z| Haz. Ratio [95% Conf. Interval] 


Prison 0.389 0.169 0.021 1.475 1.059 2.054 

Dose -0.035 0.006 0.000 0.965 0.953 0.978 

No. of subjects = 238 Log likelihood = —597.714 Stratified by clinic 

Using the above fitted model, we can obtain the adjusted 
curves below that compare the adjusted survival probabil¬ 
ities for each clinic (i.e., stratified by clinic) adjusted for 
the variables, prison and maximum methadone dose. 
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Based on these adjusted survival curves, what conclusions 
can you draw about whether the survival experience is differ¬ 
ent between the two clinics? Explain. 

3. State the hazard function model being estimated in 
the above computer results. Why is this model a no¬ 
interaction model? 

4. Using the above computer results, provide point and inter¬ 
val estimates for the effect of prison adjusted for clinic and 
dose. Is this adjusted prison effect significant? Explain. 

5. The following computer results consider a SC model that 
allows for interaction of the stratified variable clinic with 
each of the predictors, prison and dose. Product terms 
in the model are denoted as clinpr = clinic x prison and 
clindos = clinic x dose. 

Stratified Cox 
regression 
Analysis time t: 


survt 

Coef. 

Std. Err. 

P > |z| 

Haz. Ratio 

[95% Conf. Interval] 

prison 

1.087 

0.539 

0.044 

2.966 

1.032 

8.523 

dose 

-0.035 

0.020 

0.079 

0.966 

0.929 

1.004 

clinpr 

-0.585 

0.428 

0.172 

0.557 

0.241 

1.290 

clindos 

-0.001 

0.015 

0.942 

0.999 

0.971 

1.028 


No. of subjects = 238 Log likelihood = —596.779 Stratified by clinic 


State two alternative versions of the interaction model be¬ 
ing estimated by the above printout, where one of these 
versions should involve the product terms used in the 
above printout. 

6. Using the computer results above, determine the esti¬ 
mated hazard models for each clinic. (Note that the clinics 
are coded as 1 or 2.) 
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7. Below are the adjusted survival curves for each clinic 
based on the interaction model results above. These 
curves are adjusted for the prison and dose variables. 



Compare the survival curves by clinic obtained for the 
interaction model with the corresponding curves previ¬ 
ously shown for the no-interaction model. Do both curves 
indicate the similar conclusions about the clinic effect? 
Explain. 

8. Carry out a likelihood ratio test to determine whether the 
no-interaction model is appropriate. In doing so, make use 
of the computer information described above, state the 
null hypothesis, state the form of the likelihood statistic 
and its distribution under the null hypothesis, and com¬ 
pute the value of the likelihood statistic and evaluate its 
significance. What are your conclusions? 


Answers to 

Practice 

Exercises 


1. The first printout indicates that the variables large cell, 
adeno cell, small cell, and performance status do not sat¬ 
isfy the PH assumption at the 0.10 level. The second print¬ 
out considers a different model that does not contain the 
large cell and adeno cell variables. This latter printout in¬ 
dicates that small cell satisfies the PH assumption, in con¬ 
trast to the first printout. The performance status variable, 
however, does not satisfy the PH assumption as in the first 
printout. 

2. The cell type variable is defined to have four categories, 
as represented by the three dummy variables in the first 
printout. The “small cell” variable dichotomizes the cell 
type variable into the categories small cell type versus the 
rest. From the second printout, the small cell variable does 
not appear by itself to violate the PH assumption. This re¬ 
sult conflicts with the results of the first printout, for which 
the cell type variable considered in four categories does not 
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satisfy the PH assumption at the 0.10 level of significance. 
We therefore think it is more appropriate to use a SC pro¬ 
cedure only if four strata are to be used. A drawback to 
using four strata, however, is that the number of survival 
curves to be plotted is larger than for two strata; conse¬ 
quently, a large number of curves is more difficult to in¬ 
terpret graphically than when there are only two curves. 
Thus, for convenience of interpretation, we may choose 
to dichotomize the cell type variable instead of consider¬ 
ing four strata. We may also consider dichotomies other 
than those defined by the small cell variable. For instance, 
we might consider dichotomizing on either the adeno or 
large cell variables instead of the small cell variable. Al¬ 
ternatively, we may combine categories so as to compare, 
say, large and adeno cell types with small and squamous 
types. However, a decision to combine categories should 
not be just a statistical decision, but should also be based 
on biologic considerations. 

3. HRa d j = 1.095, 95% Cl: (0.744,1.611), two-tailed P-value is 
0.647, not significant. The estimated hazard ratio for treat¬ 
ment is neither meaningfully or statistically significant. 
The point estimate is essentially 1, which says that there is 
no meaningful effect of treatment adjusted for the predic¬ 
tors in the model and for the stratified predictor SZ*. 

4. h g (t,X) = /zogfOexpfp! Treatment + (3 2 DD + |3 3 Age 

+ |3 4 PT], g = 1,..., 4, where the strata are defined from 
the stratification variable SZ*,DD = disease duration, 
and PT = prior therapy. This model assumes no interac¬ 
tion because the coefficient of each predictor in the model 
is not subscripted by g, i.e., the regression coefficients are 
the same for each stratum. 

5. Version 1: h g (t, X) = /zo g (t)exp[|3 lg Treatment + [3 2 DD 
+ |3 3 Age + (3 4 PT], g = 1,..., 4. 

Version 2: h g (t,X ) = /z 0g (T)exp[(3j Treatment + |3 2 DD 
+ |3 3 Age + |3 4 PT-(- |3 5 (Zj x Treatment) 

+ (3 6 (Z| x Treatment) + (3 7 (Z* x Z\ x Treatment)], 
where Z* = small cell type (0, 1), Z 2 = PSbin(0, 1), 
and g = 1,..., 4. 
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6 . Version 1: h g (t,X) = /z 0 g(f)exp[( 3 lg Treatment + (3 2g DD 
+ p 3 g Age + (3 4g PT], g = 1,..., 4. 

Version 2: h g {t,X) = /z 0 g (r)exp[|3 1 Treatment + (3 2 DD 
+ |3 3 Age + £4 PT + |3 5 (Zj x Treatment) + |3 6 (Z* x DD) 
+ (3 7 (Z* x Age) + |3 8 (Z* x PT) + (3 9 (Z| x Treatment) 

+ |3 10 (Z 2 * x DD) + (3„(Z* x Age) + (3 12 (Z| x PT) 

+ (3 13 (Zj x Z 2 * x Treatment) + (3 14 (Z* x ZJ x DD) 

+ (3 15 (Z* x Zf x Age) + (3 16 (Z* x Z* x PT)], 
g = 1,-4. 

7. HPg = exp-P'U, using version 1 model form. Yes, this for¬ 
mula gives different hazard ratios for different strata be¬ 
cause the value of the hazard ratio changes with the sub¬ 
script g. 

8 . H 0 : No interaction assumption is satisfied. 

Ho- |3 11 = (3 12 = (3 13 = (3 14 , (3 21 = [3 22 = |3 23 = |3 24 , 

P 31 = P 32 = P 33 = |3 34 , (3 41 = |3 42 = (3 43 = (3 44 
from version 1 . 

Ho- P 5 = |3g = (3 7 = |3 g = (3 9 = |3 10 = |3u = (3 12 
= (3 13 = (3 14 = |3 15 = |3 16 = 0 from version 2. 

9. LR = —2ln Lr — (—2 In Lp), where R denotes the reduced 
(no-interaction) model and F denotes the full (interaction) 
model. Under the null hypothesis, LR is approximately a 
chi-square with 12 degrees of freedom. 

10. Estimated hazard models for each stratum: 
g = l;Z* = Z 2 * = 0: 

h 1 (t,X) = /zoi(t)exp[(0.381)Treatment + (0.015)DD 
+ (0.000) Age + (0.023)PP] 

g=2;Z* = l,Z 2 * = 0: 

h 2 {t,X) — /z 0 2(t)exp[(0.941)Treatment + (—0.014)DD 
+ (—0.055)Age + (0.066)PP] 

g = 3; Zj = 0, Z 2 * = l: 

/z 3 (f,X) = h 03 (0 exp[(—0.210)Treatment + (0.040)DD 
+ (0.001)Age + (—0.055)PP] 

g = 4; Z* = 1, Z| = 1: 

/z 4 (f,X) = /zo4(t)exp[(0.026)Treatment + (—0.060)DD 
+ (0.030)Age + (-0.017)PP] 
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Estimated hazard ratios for treatment effect adjusted for 
DD, Age, and PT: 


g = 1; HRi = exp (0.381) = 1.464 
g = 2: HR 2 = exp (0.941) = 2.563 
g = 2 : hr 3 = exp (—0.210) = 0.811 
g =4: HR 4 = exp (0.026) = 1.026 

11. Hq: P 5 = |3g = (3 7 = (3g = |3 9 = (3iq = (3n = P 12 = |3j3 = 

( 3 14 = ( 3 15 = ( 3 1 6 = 0 

LR = 689.696 — 671.182 = 18.514, which is approxi¬ 
mately chi-square with 12 df. 

P — 0.101, which is not significant below the .05 level. 
Conclusion: Accept the null hypothesis and conclude that 
the no-interaction model is preferable to the interaction 
model. 

12. The three curves at the bottom of the graph appear to be 
quite non-parallel. Thus, the PH assumption is not satis¬ 
fied for one or both of the variables, small cell type and 
PSbin. Note, however, that because both these variables 
have been stratified together, it is not clear from the graph 
whether only one of these variables fails to satisfy the PH 
assumption. 

13. If we graph adjusted survival curves that compare the two 
treatment groups for each of the four strata, we will be 
able to see graphically how the treatment effect, if any, 
varies over time within each strata. The difficulty with this 
approach, however, is that eight adjusted survival curves 
will be produced, so that if all eight curves are put on the 
same graph, it may be difficult to see what is going on. 
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Introduction 


Abbreviated 

Outline 


We begin by defining a time-dependent variable and providing 
some examples of such a variable. We also state the general 
formula for a Cox model that is extended to allow time depen¬ 
dent variables, followed by a discussion of the characteristics 
of this model, including a description of the hazard ratio. 

In the remainder of the presentation, we give examples of 
models with time-dependent variables, including models that 
allow for checking the PH assumption for time-independent 
variables. In particular, we describe a method that uses “heav- 
iside functions” to evaluate the PH assumption for time- 
independent variables. We also describe two computer appli¬ 
cations of the extended Cox model, one concerning a study on 
the treatment of heroin addiction and the other concerning 
the Stanford heart transplant study. 


The outline below gives the user a preview of the material to 
be covered by the presentation. A detailed outline for review 
purposes follows the presentation. 

I. Preview (page 214) 

II. Review of the Cox PH Model (pages 214-216) 

III. Definition and Examples of Time-Dependent 
Variables (pages 216-219) 

IV. The Extended Cox Model for Time-Dependent 
Variables (pages 219-221) 

V. The Hazard Ratio Formula for the Extended Cox 
Model (pages 221-223) 

VI. Assessing Time-Independent Variables That Do 
Not Satisfy the PH Assumption (pages 224-229) 

VII. An Application of the Extended Cox Model to an 
Epidemiologic Study on the Treatment of Heroin 
Addiction (pages 230-234) 

VIII. An Application of the Extended Cox Model to the 
Analysis of the Stanford Heart Transplant Data 
(pages 235-239) 

IX. The Extended Cox Likelihood (pages 239-242) 

X. Summary (pages 242-245) 
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Objectives 


Upon completing the chapter, the learner should be able to: 

1. State or recognize the general form of the Cox model ex¬ 
tended for time-dependent variables. 

2. State the specific form of an extended Cox model appro¬ 
priate for the analysis, given a survival analysis scenario 
involving one or more time-dependent variables. 

3. State the formula for a designated hazard ratio of interest, 
given a scenario describing a survival analysis using an 
extended Cox model. 

4. State the formula for an extended Cox model that pro¬ 
vides a method for checking the PH assumption for one 
more of the time-independent variables in the model, given 
a scenario describing a survival analysis involving time- 
independent variables. 

5. State the formula for an extended Cox model that uses 
one or more heaviside functions to check the PH assump¬ 
tion for one more of the time-independent variables in the 
model, given a scenario describing a survival analysis in¬ 
volving time-independent variables. 

6 . State the formula for the hazard ratio during different time 
interval categories specified by the heaviside function(s), 
for a model involving heaviside function(s). 

7. Carry out an appropriate analysis of the data to evaluate 
the effect of one or more of the explanatory variables in 
the model(s) being used, given computer results for a sur¬ 
vival analysis involving time-dependent variables. Such an 
analysis will involve: 

• computing and interpreting any hazard ratio(s) of 
interest; 

• carrying out and interpreting appropriate test(s) of 
hypotheses for effects of interest; 

• obtaining confidence intervals for hazard ratios of 
interest; 

• evaluating interaction and confounding involving one 
or more covariates. 
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Presentation 


I. Preview 



This presentation describes how the Cox propor¬ 
tional hazards (PH) model can be extended to al¬ 
low time-dependent variables as predictors. Here, 
we focus on the model form, characteristics of this 
model, the formula for and interpretation of the 
hazard ratio, and examples of the extended Cox 
model. We also show how the extended Cox model 
can be used to check the PH assumption for time- 
independent variables, and we provide computer 
applications to illustrate different types of time- 
dependent variables. Finally, we describe the ex¬ 
tended cox likelihood and how it contrasts with 
the Cox PH likelihood function. 


II. Review of the Cox 
PH Model 


h(t,X) = /z 0 (t)exp 

E i w 


_! = 1 J 


X = (X U X 2 ,...,X P ) 
Explanatory/predictor variables 


The general form of the Cox PH model is shown 
here. This model gives an expression for the haz¬ 
ard at time t for an individual with a given spec¬ 
ification of a set of explanatory variables denoted 
by the bold X. That is, the bold X represents a col¬ 
lection (sometimes called a “vector”) of predictor 
variables that is being modeled to predict an indi¬ 
vidual’s hazard. 


ho(t) x exp 



Baseline hazard 

Involves t but 
not X's 


Exponential 

Involves X's but 
not t (X’s are 
time- 

independent) 


The Cox model formula says that the hazard at 
time t is the product of two quantities. The first 
of these, ho(t), is called the baseline hazard func¬ 
tion. The second quantity is the exponential ex¬ 
pression e to the linear sum of (3,X,-, where the 
sum is over the p explanatory X variables. 

An important feature of this formula, which con¬ 
cerns the proportional hazards (PH) assumption, 
is that the baseline hazard is a function of t but 
does not involve the X’s, whereas the exponential 
expression involves the X’s but does not involve t. 
The X's here are called time-independent X’s. 
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X s involving t: time dependent 

Requires extended Cox model 
(no PH) 


It is possible, nevertheless, to consider X’s that 
do involve t. Such X’s are called time-dependent 
variables. If time-dependent variables are consid¬ 
ered, the Cox model form may still be used, but 
such a model no longer satisfies the PH assump¬ 
tion and is called the extended Cox model. We 
will discuss time-dependent variables and the cor¬ 
responding extended Cox model beginning in the 
next section. 


Hazard ratio formula: 


HR = exp 




! = 1 



From the Cox PH model, we can obtain a gen¬ 
eral formula, shown here, for estimating a hazard 
ratio that compares two specifications of the X’s, 
defined as X* and X. 


where X* = {X\, X\,, X*) and 
X = (X U X 2 ,..., X p ) denote the 
two sets of X s. 


PH assumption: 
fi(t,X*) 

—r-= 0 (a constant over t) 

h{tX) 

i.e.,h(t,X*) = Qh(t,X) 


The (PH) assumption underlying the Cox PH 
model is that the hazard ratio comparing any 
two specifications of X predictors is constant over 
time. Equivalently, this means that the hazard for 
one individual is proportional to the hazard for 
any other individual, where the proportionality 
constant is independent of time. 


Hazards cross PH not met An example of when the PH assumption is not met 

is given by any study situation in which the haz- 
Hazards don't cross PH met ards for two or more groups cross when graphed 

against time. However, even if the hazard func¬ 
tions do not cross, it is possible that the PH as¬ 
sumption is not met. 


Three approaches: 

• graphical 

• time-dependent variables 

• goodness-of-fit test 


As described in more detail in Chapter 4, there 
are three general approaches for assessing the PH 
assumption. These are 

• a graphical approach; 

• the use of time-dependent variables in an ex¬ 
tended Cox model; and 

• the use of a goodness-of-fit test. 


Time-dependent covariates: When time-dependent variables are used to assess 

the PH assumption for a time-independent vari- 
Extend Cox model: add product able, the Cox model is extended to contain prod- 
tern^s) involving some function of uct (i.e., interaction) terms involving the time¬ 
time independent variable being assessed and some 

function of time. 
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Options when PH assumption not 
satisfied: 

• Use a stratified Cox (SC) model. 

• Use time-dependent variables. 


Time-dependent variables may be: 

• inherently time-dependent 

• defined to analyze a time- 
independent predictor not 
satisfying the PH assumption. 


For example, if the PH assumption is being as¬ 
sessed for gender, a Cox model might be extended 
to include the variable sex x t in addition to sex. 
If the coefficient of the product term turns out to 
be non-significant, we can conclude that the PH 
assumption is satisfied for sex provided that the 
variable sex x t is an appropriate choice of time- 
dependent variable. 

There are two options to consider if the PH as¬ 
sumption is not satisfied for one or more of the 
predictors in the model. In Chapter 5, we de¬ 
scribed the option of using a stratified Cox (SC) 
model, which stratifies on the predictor(s) not sat¬ 
isfying the PH assumption, while keeping in the 
model those predictors that satisfy the PH as¬ 
sumption. In this chapter, we describe the other 
option, which involves using time-dependent vari¬ 
ables. 

Note that a given study may consider predictors 
that are inherently defined as time-dependent, as 
we will illustrate in the next section. Thus, in addi¬ 
tion to considering time-dependent variables as an 
option for analyzing a time-independent variable 
not satisfying the PH assumption, we also discuss 
predictors which are inherently defined as time- 
dependent. 


111. Definition and Examples 
of Time-Dependent 
Variables 

Definition: 


Time-dependent 

Time-independent 

Value of variable 

Value of variable 

differs over time 

is constant over 


time 

Example: 


(Race x t) 

I Race | 


A time-dependent variable is defined as any vari¬ 
able whose value for a given subject may differ 
over time (t ). In contrast, a time-independent vari¬ 
able is a variable whose value for a given subject 
remains constant over time. 

As a simple example, the variable RACE is a 
time-independent variable, whereas the variable 
RACE x time is a time-dependent variable. 
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The variable RACE x time is an example of what is 
called a “defined” time-dependent variable. Most 
defined variables are of the form of the product of a 
time-independent variable (e.g., RACE) multiplied 
by time or some function of time. Note that after 
RACE is determined for a given subject, all the val¬ 
ues of the RACE x time variable are completely 
defined over a specified time interval of study. 

A second example of a defined variable is given by 
E x (log t — 3), where E denotes, say, a (0,1) expo¬ 
sure status variable determined at one’s entry into 
the study. Notice that here we have used a func¬ 
tion of time—that is, log t — 3—rather than time 
alone. 

Yet another example of a defined variable, which 
also involves a function of time, is given by E x 
g(t), where g (l ) is defined to take on the value 1 if 
t is greater than or equal to some specified value 
of t, called to, and takes on the value 0 if t is less 
than to- 

The function g(t) is called a “heaviside” function. 
Note that whenever t is greater than or equal to 
to, g(t) equals 1, so £ x g(t) = E; however, when¬ 
ever t is less than to,g(t) = 0, so the value of 
E x g(t) is always 0. We will later return to illus¬ 
trate how heaviside functions may be used as one 
method for the analysis when a time-independent 
variable like E does not satisfy the proportional 
hazards assumption. 


Internal variable: 


EXAMPLES OF INTERNAL 
VARIABLES 


E(t), EMP{t), SMK(t), OBS(t), 


Another type of time-dependent variable is called 
an “internal” variable. Examples of such a variable 
include exposure level E at time t, employment 
status ( EMP ) at time t, smoking status ( SMK ) at 
time t, and obesity level ( OBS ) at time t. 


Values change because of “internal” All these examples consider variables whose val- 
characteristics or behavior of the in- ues may change over time for any subject under 
dividual. study; moreover, for internal variables, the reason 

for a change in value depends on “internal” char¬ 
acteristics or behavior specific to the individual. 
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“Ancillary” variable: In contrast, a variable is called an “ancillary” vari- 

Value changes because of “external” able if its value changes primarily because of “ex¬ 
characteristics. temal” characteristics of the environment that 

may affect several individuals simultaneously. An 
example of an ancillary variable is air pollution 
index at time t for a particular geographical area. 
Another example is employment status ( EMP ) at 
time t, if the primary reason for whether some¬ 
one is employed or not depends more on general 
economic circumstances than on individual char¬ 
acteristics. 

As another example, which may be part internal 
and part ancillary, we consider heart transplant 
status (HT) at time t for a person identified to have 
a serious heart condition, making him or her el¬ 
igible for a transplant. The value of this variable 
HT at time l is 1 if the person has already received 
a transplant at some time, say to, prior to time t. 
The value of HT is 0 at time t if the person has not 
yet received a transplant by time t. 

Note that once a person receives a transplant, 
at time to, the value of HT remains at 1 for all 
subsequent times. Thus, for a person receiving a 
transplant, the value of HT is 0 up to the time of 
transplant, and then remains at 1 thereafter. In 
contrast, a person who never receives a transplant 
has HT equal to 0 for all times during the period 
he or she is in the study. 

The variable “heart transplant status,” HT(t), can 
be considered essentially an internal variable, be¬ 
cause individual traits of an eligible transplant re¬ 
cipient are important determinants of the decision 
to carry out transplant surgery. Nevertheless, the 
availability of a donor heart prior to tissue and 
other matching with an eligible recipient can be 
considered an “ancillary” characteristic external 
to the recipient. 


EXAMPLES OF ANCILLARY 
VARIABLES 


Air pollution index at time f; EMP(t) 
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Computer commands differ for 
defined vs. internal vs. ancillary. 

But, the form of extended Cox 
model and procedures for analysis 
are the same regardless of variable 
type. 


The primary reason for distinguishing among de¬ 
fined, internal, or ancillary variables is that the 
computer commands required to define the vari¬ 
ables for use in an extended Cox model are some¬ 
what different for the different variable types, 
depending on the computer program used. Never¬ 
theless, the form of the extended Cox model is the 
same regardless of variable type, and the proce¬ 
dures for obtaining estimates of regression coeffi¬ 
cients and other parameters, as well as for carrying 
out statistical inferences, are also the same. 


IV. The Extended Cox Model 
for Time-Dependent 
Variables 


h(t,X(t)) = h 0 (t) exp 


pi 

E ft*. 


Li=l 


+ E &jXj(t) 


7 = 1 


X(f) = (x u x 2 .x pi . 

Time-independent 


Given a survival analysis situation involving both 
time-independent and time-dependent predictor 
variables, we can write the extended Cox model 
that incorporates both types as shown here at the 
left. As with the Cox PH model, the extended model 
contains a baseline hazards function ho(t) which 
is multiplied by an exponential function. How¬ 
ever, in the extended model, the exponential part 
contains both time-independent predictors, as de¬ 
noted by the A,- variables, and time-dependent pre¬ 
dictors, as denoted by the Xj(t) variables. The en¬ 
tire collection of predictors at time t is denoted by 
the bold X(t). 


A 1 (t),A 2 (t),...A P 2 (t)) 

Time-dependent 



As a simple example of an extended Cox model, 
we show here a model with one time-independent 
variable and one time-dependent variable. The 
time-independent variable is exposure status E, 
say a (0,1) variable, and the time-dependent vari¬ 
able is the product term Ext. 


Estimating regression 
coefficients: 

ML procedure: 

Maximize (partial) L. 

Risk sets more complicated than for 
PH model. 


As with the simpler Cox PH model, the regression 
coefficients in the extended Cox model are esti¬ 
mated using a maximum likelihood (ML) proce¬ 
dure. ML estimates are obtained by maximizing a 
(partial) likelihood function L. However, the com¬ 
putations for the extended Cox model are more 
complicated than for the Cox PH model, because 
the risk sets used to form the likelihood function 
are more complicated with time-dependent vari¬ 
ables. The extended Cox likelihood is described 
later in this chapter. 










220 6. Extension of the Cox Proportional Hazards Model 


Computer programs for the 
extended Cox model: 


Stata (Stcox) 

SAS (PHREG) 
SPSS (COXREG) 


Computer 

Appendix 


Computer packages that include programs for fit¬ 
ting the extended Cox model include Stata, SAS, 
and SPSS. See the Computer Appendix at the end 
of this text for a comparison of the Stata, SAS, and 
SPSS procedures applied to the same dataset. 


Statistical inferences: 

Wald and/or LR tests 

Large sample confidence intervals 


Methods for making statistical inferences are es¬ 
sentially the same as for the PH model. That is, one 
can use Wald and/or likelihood ratio ( LR ) tests and 
large sample confidence interval methods. 


Assumption of the model: 

The hazard at time t depends on the 
value of Xj ( t ) at that same time. 


An important assumption of the extended Cox 
model is that the effect of a time-dependent vari¬ 
able Xj (l) on the survival probability at time t de¬ 
pends on the value of this variable at that same 
time t, and not on the value at an earlier or later 
time. 


h(t,X(t)) = h 0 (t)ex p 


E ft* 


t=i 


+ E £/*y(0 

j=i > p 

One coefficient forX,(t) 


Note that even though the values of the vari¬ 
able Xj(t) may change over time, the hazard 
model provides only one coefficient for each time- 
dependent variable in the model. Thus, at time t, 
there is only one value of the variable X ; (t) that 
has an effect on the hazard, that value being mea¬ 
sured at time t. 


Can modify for lag-time effect It is possible, nevertheless, to modify the definition 

of the time-dependent variable to allow for a “lag¬ 
time” effect. 


Lag-time effect: 


EXAMPLE 


EMP(t) = employment status at week t 

Model without lag-time: 
h(t,X(t)) = h Q (t) exp[8EMP(t )] 

^ Same week 


Model with 1-week lag-time: 
h{t,X(t)) = h Q (t) exp[8*EMP(t - 1)] 

/ 

One-week earlier 


To illustrate the idea of a lag-time effect, suppose, 
for example, that employment status, measured 
weekly and denoted as EMP(t), is the time- 
dependent variable being considered. Then, an ex¬ 
tended Cox model that does not consider lag-time 
assumes that the effect of employment status on 
the probability of survival at week t depends on the 
observed value of this variable at the same week t, 
and not, for example, at an earlier week. 

However, to allow for, say, a time-lag of one week, 
the employment status variable may be modified 
so that the hazard model at time t is predicted by 
the employment status at week t — 1. Thus, the 
variable EMP(t) is replaced in the model by the 
variable EMP (t — 1). 
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General lag-time extended model: 


h(t,X(t )) = h 0 (t)exp 


pi 

E 


1=1 




Xj(t — L j) replaces Xj(t) 


More generally, the extended Cox model may be 
alternatively written to allow for a lag-time modi¬ 
fication of any time-dependent variable of interest. 
If we let L j denote the lag-time specified for time- 
dependent variable j, then the general “lag-time 
extended model” can be written as shown here. 
Note that the variable Xj(t) in the earlier version 
of the extended model is now replaced by the vari¬ 
able Xj(t — Lj). 


V. The Hazard Ratio Formula 
for the Extended 
Cox Model 


PH assumption is not satisfied for 
the extended Cox model. 


We now describe the formula for the hazard ra¬ 
tio that derives from the extended Cox model. 
The most important feature of this formula is 
that the proportional hazards assumption is no 
longer satisfied when using the extended Cox 
model. 


HR(t) = 


Mcx*(0) 

HtMt)) 


= exp 


EM*;-*.-] 


;=i 


P 2 


E s, [**«-*, (0] 

7 = 1 


The general hazard ratio formula for the extended 
Cox model is shown here. This formula describes 
the ratio of hazards at a particular time t, and re¬ 
quires the specification of two sets of predictors 
at time t. These two sets are denoted as bold X*(f) 
and bold X(f)- 


Two sets of predictors: 

X*(t) = (X* l ,X*,...,X* pi ,X* l (t), 
X* 2 (t),x; 2 (0) 

X(t) = (X u X 2 ,...,X Pl ,X l (t), 
X 2 (t),...,X P2 (t)) 


The two sets of predictors, X*(t) and X(t), identify 
two specifications at time t for the combined set of 
predictors containing both time-independent and 
time-dependent variables. The individual compo¬ 
nents for each set of predictors are shown here. 
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As a simple example, suppose the model contains 
only one time-independent predictor, namely, ex¬ 
posure status E, a (0,1) variable, and one time- 
dependent predictor, namely, Ext. Then, to com¬ 
pare exposed persons, for whom E = 1, with 
unexposed persons, for whom E = 0, at time 
t, the bold X*(t ) set of predictors has as its 
two components E = 1 and E x t = f; the bold 
X(t) set has as its two components E = 0 and 
E x t = 0. 


If we now calculate the estimated hazard ratio that 
compares exposed to unexposed persons at time 
t, we obtain the formula shown here; that is, EIR 
“hat” equals the exponential of |3 “hat” plus 6 “hat” 
times t. This formula says that the hazard ratio is 
a function of time; in particular, if 6 “hat” is posi¬ 
tive, then the hazard ratio increases with increas¬ 
ing time. Thus, the hazard ratio in this example is 
certainly not constant, so that the PH assumption 
is not satisfied for this model. 


EIR{t) = exp 


jr&ixt-Xi] 


;=i 


V 2 


+ENItP<'>-*/<'>] 

/=! 




A function of time 


More generally, because the general hazard ratio 
formula involves differences in the values of the 
time-dependent variables at time t, this hazard ra¬ 
tio is a function of time. Thus, in general, the ex¬ 
tended Cox model does not satisfy the PH assump¬ 
tion if any 6, is not equal to zero. 


In general, PH assumption not sat¬ 
isfied for extended Cox model. 

Sj is not time-dependent. 
bj represents “overall” effect of 


Note that, in the hazard ratio formula, the co¬ 
efficient 8j “hat” of the difference in values 
of the ;th time-dependent variable is itself not 
time-dependent. Thus, this coefficient represents 
the “overall” effect of the corresponding time- 
dependent variable, considering all times at which 
this variable has been measured in the study. 
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As another example to illustrate the formula for 
the hazard ratio, consider an extended Cox model 
containing only one variable, say a weekly mea¬ 
sure of chemical exposure status at time t. Sup¬ 
pose this variable, denoted as E(t), can take one of 
two values, 0 or 1, depending on whether a person 
is unexposed or exposed, respectively, at a given 
weekly measurement. 

As defined, the variable E(t) can take on differ¬ 
ent patterns of values for different subjects. For 
example, for a five-week period, subject As values 
may be 01011, whereas subject B’s values may be 
11010. 

Note that in this example, we do not consider 
two separate groups of subjects, with one group 
always exposed and the other group always un¬ 
exposed throughout the study. This latter situa¬ 
tion would require a (0,1) time-independent vari¬ 
able for exposure, whereas our example involves 
a time-dependent exposure variable. 

The extended Cox model that includes only the 
variable E{t) is shown here. In this model, the val¬ 
ues of the exposure variable may change over time 
for different subjects, but there is only one coeffi¬ 
cient, 6, corresponding to the one variable in the 
model. Thus, 6 represents the overall effect on sur¬ 
vival time of the time-dependent variable E(t). 

Notice, also, that the hazard ratio formula, which 
compares an exposed person to an unexposed per¬ 
son at time t, yields the expression e to the 6 “hat.” 

Although this result is a fixed number, the PH as¬ 
sumption is not satisfied. The fixed number gives 
the hazard ratio at a given time, assuming that the 
exposure status at that time is 1 in the numerator 
and is 0 denominator. Thus, the hazard ratio is 
time-dependent, because exposure status is time- 
dependent, even though the formula yields a single 
fixed number. 
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VI. Assessing Time- 

Independent Variables 
That Do Not Satisfy the 
PH Assumption 


Use an extended Cox model to 

• check PH assumption; 

• assess effect of variable not 
satisfying PH assumption. 


We now discuss how to use an extended Cox model 
to check the PH assumption for time-independent 
variables and to assess the effect of a variable that 
does not satisfy the PH assumption. 


Three methods for checking PH as¬ 
sumption: 

1. graphical _ 

2. [extended Cox model) 

3. GOF test 


As described previously (see Chapter 4), there are 
three methods commonly used to assess the PH 
assumption: (1) graphical, using, say, log-log sur¬ 
vival curves; (2) using an extended Cox model; and 
(3) using a goodness-of-fit (GOF) test. We have pre¬ 
viously (in Chapter 4) discussed items 1 and 3, but 
only briefly described item 2, which we focus on 
here. 


Cox PH model for p time- 
independent As: 


h(t,X) = h 0 (t)ex p 



If the dataset for our study contains several, say p, 
time-independent variables, we might wish to fit a 
Cox PH model containing each of these variables, 
as shown here. 


Extended Cox model: 

Add product terms of the form: 

A, x gi(t) 


However, to assess whether such a PH model is 
appropriate, we can extend this model by defin¬ 
ing several product terms involving each time- 
independent variable with some function of time. 
That is, if the / th time-independent variable is de¬ 
noted as Xi , then we can define the /th product 
term as A, x g,(/) where g,(/) is some function of 
time for the /th variable. 


h{t,X(t)) = /zo(/)exp 


E 13,-X/ 


L/=i 

p 


+ E &iXigi(t) 


i = 1 


The extended Cox model that simultaneously con¬ 
siders all time-independent variables of interest is 
shown here. 
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EXAMPLE 

g ; (f) = 0 for all i imp 
dependent variable 

h(t,X(t )) = h 0 {t) exp 

dies no 
involvi 

i= i J 

time- 
ng X j? i.e., 


In using this extended model, the crucial decision 
is the form that the functions g;(f) should take. 
The simplest form for g,(f) is that all g t (l) are 
identically 0 at any time; this is another way of 
stating the original PH model, containing no time- 
dependent terms. 



Another choice for theg;(t) is to let g, (T) = t. This 
implies that for each A, in the model as a main ef¬ 
fect, there is a corresponding time-dependent vari¬ 
able in the model of the form A, x l. The extended 
Cox model in this case takes the form shown here. 



Suppose, however, we wish to focus on a partic¬ 
ular time-independent variable, say, variable X L . 
Then g;(t) = t for i = L, but equals 0 for all other 
i. The corresponding extended Cox model would 
then contain only one product term Ai x (, as 
shown here. 


EXAMPLE 4 


gi (t) = In t => VjgTO 

= X { x In t 

' p p 

/i(f,X(f)) = h 0 (t) exp 

EP^+ES^xlnO 

.i= 1 i=l 


Another choice for the g,(f) is the log of t, rather 
than simply t, so that the corresponding time- 
dependent variables will be of the form A,■ x In t. 


EXAMPLE 5: Heaviside Function 

gi(f)=| 

' 0 if t > t 0 

L 1 if t < f 0 


And yet another choice would be to let g,(0 be a 
“heaviside function” of the form g, (0 = 1 when 
t is at or above some specified time, say to, and 
g,(t) = 0 when t is below f 0 . We will discuss this 
choice in more detail shortly. 


Extended Cox model: 


h(t,X(t)) = h 0 (t)ex p 


E 


;=i 


+ S; Ajgi(t) 

i=1 


Given a particular choice of the g,(f), the corre¬ 
sponding extended Cox model, shown here again 
in general form, may then be used to check the PH 
assumption for the time-independent variables in 
the model. Also, we can use this extended Cox 
model to obtain a hazard ratio formula that con¬ 
siders the effects of variables not satisfying the PH 
assumption. 


• Check PH assumption. 

• Obtain hazard ratio when PH 
assumption not satisfied. 

Hq : = 82 = ■ ■ ■ = S p = 0 


To check the PH assumption using a statistical 
test, we consider the null hypothesis that all the 6 
terms, which are coefficients of the A;g;(f) prod¬ 
uct terms in the model, are zero. 
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Under Ho, the model reduces to PH Under this null hypothesis, the model reduces to 


model: 


h(t,X) = h 0 (t)ex p 


E M 


i =1 


the PH model. 


LR — -2 In Lpn moc iei This test can be carried out using a likelihood ratio 

—(—2 In L r hi) (ER) test which computes the difference between 

the log likelihood statistic, —2 In L, for the PH 
under Ho model and the log likelihood statistic for the ex¬ 

tended Cox model. The test statistic thus obtained 
has approximately a chi-square distribution with 
p degrees of freedom under the null hypothesis, 
where p denotes the number of parameters being 
set equal to zero under Hq. 



As an example of this test, suppose we again con¬ 
sider an extended Cox model that contains the 
product term E x t in addition to the main effect 
of£, where £ denotes a (0,1) time-independent ex¬ 
posure variable. 

For this model, a test for whether or not the PH 
assumption is satisfied is equivalent to testing the 
null hypothesis that 5 = 0. Under this hypothesis, 
the reduced model is given by the PH model con¬ 
taining the main effect E only. The likelihood ra¬ 
tio statistic, shown here as the difference between 
log-likelihood statistics for the full (i.e., extended 
model) and the reduced (i.e., PH) model, will have 
an approximate chi-square distribution with one 
degree of freedom in large samples. 


SAS: PHREG fits both PH and 
extended Cox models. 
Stata: Stcox fits both PH and 
extended Cox models. 


Note that to carry out the computations for this 
test, two different types of models, a PH model 
and an extended Cox model, need to be fit. 


If PH test significant: Extended Cox If the result of the test for the PH assumption is sig- 
model is preferred; HR is time- nificant, then the extended Cox model is preferred 
dependent. to the PH model. Thus, the hazard ratio expression 

obtained for the effect of an exposure variable of 
interest is time-dependent. That is, the effect of the 
exposure on the outcome cannot be summarized 
by a single HR value, but can only be expressed as 
a function of time. 
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EXAMPLE 


h(t,X(t)) = h 0 (t) exptp£ + 5(£ xf)] 


HR = exp[p + 8f] 



We again consider the previous example, with the 
extended Cox model shown here. For this model, 
the estimated hazard ratio for the effect of expo¬ 
sure is given by the expression e to the quantity 
(3 “hat” plus 6 “hat” times t. Thus, depending on 
whether 6 “hat” is positive or negative, the esti¬ 
mated hazard ratio will increase or decrease ex¬ 
ponentially as t increases. The graph shown here 
gives a sketch of how the hazard ratio varies with 
time if 6 “hat” is positive. 


Heaviside function: 

HR 


We now provide a description of the use of a “heav- 
iside” function. When such a function is used, the 
hazard ratio formula yields constant hazard ratios 
for different time intervals, as illustrated in the ac¬ 
companying graph. 



iff > t 0 
iff < t 0 


h(t,X(t)) = h 0 (t) exp[|3£ + 5Eg(t )] 


Recall that a heaviside function is of the form g(t), 
which takes on the value 1 if t is greater than or 
equal to some specified value of f, called to, and 
takes on the value 0 if f is less than to- An extended 
Cox model which contains a single heaviside func¬ 
tion is shown here. 


t > to: g(t) = 1 =+ E xg(t) = E 
h{tX) = /z 0 (f)exp[(|3 + S)£] 
HR = exp[|3 + 6] 


Note that if t > to,g(t) = 1, so the value of E x 
g(t) = E; the corresponding hazard function is of 
the form ho(t) x e to the quantity ((3 + 6) times E, 
and the estimated hazard ratio for the effect of E 
has the form e to the sum of (3 “hat” plus 6 “hat.” 


t < to: g(f) = 0 =>- E x g(t) = 0 If f < to, g(f) = 0, the corresponding hazard ratio 
h(t,X) = h 0 (t) exp[|3 E] is simplified to e to the (3 “hat.” 

HR = exp[|3] 


A single heaviside function in the 
model 

h{tX,) 

= h 0 (t) exp[|3£ + 8(E x g(t))] 


Thus, we have shown that the use of a single heav¬ 
iside function results in an extended Cox model 
which gives two hazard ratio values, each value 
being constant over a fixed time interval. 


yields two hazard ratios: 
t > to: HR = exp((3 + 6) 
t < to: HR = exp((3) 
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Alternative model with two heavi- 
side functions: 

h(t,X) = ho(t)exp[5\(E xgdO) 

+ 8 2 (E xg 2 (t))] 


gi(0 = 
giit) = 


1 if t > to 

0 if t < to 

1 if t < to 

0 if t > to 


Note: Main effect for £ not in model. 


Two HR’s from the alternative 
model: 


t > to: gift) = l,g 2 (f) = 0 
h{t,X) =/ 2 o(Oexp[ 6 i(£ x 1 ) 

+ 8 2 (E x 0)] 

= /r 0 (Oexp[6i£] 
so that HR = exp(§i) 

t < t 0 :gi(t) = 0 ,g 2 (t) = 1 
h{t,X) = ho(t)exp[bi(E x 0) 

+ 8 2 (E x 1)] 

= h 0 (t) exp[8 2 E] 
so that HR = exp(S 2 ) 

Alternative model: 

h(t,X(t)) = h 0 (t) exp[6j(£ xgdO) 
+ 8 2 (E x g 2 (f))] 

Original model: 
h(t,X(t)) 

= ho(t)exp[fiE + 8(E x g(t))] 
t > to'- HR = exp(Si) = exp(p + 6) 
t < to: HR = exp(S 2 ) = exp((3) 


There is actually an equivalent way to write this 
model that uses two heaviside functions in the 
same model. This alternative model is shown here. 
The two heaviside functions are called gi(f) and 
g 2 (f )• Each of these functions are in the model as 
part of a product term with the exposure variable 
E. Note that this model does not contain a main 
effect term for exposure. 


For this alternative model, as for the earlier model 
with only one heaviside function, two different 
hazard ratios are obtained for different time inter¬ 
vals. To obtain the first hazard ratio, we consider 
the form that the model takes when t > to- In this 
case, the value of g i (f) is 1 and the value of g 2 (f) 
is 0, so the exponential part of the model simpli¬ 
fies to 5i x E] the corresponding formula for the 
estimated hazard ratio is then e to the 6i “hat.” 

When t < t 0 , the value of g i (?) is 0 and the value of 
g 2 (t) is 1. Then, the exponential part of the model 
becomes 8 2 x E, and the corresponding hazard 
ratio formula is e to the 6 2 “hat.” 


Thus, using the alternative model, again shown 
here, we obtain two distinct hazard ratio values. 
Mathematically, these are the same values as ob¬ 
tained from the original model containing only 
one heaviside function. In other words, 8i “hat” in 
the alternative model equals (3 “hat” plus 8 “hat” in 
the original model (containing one heaviside func¬ 
tion), and 8 2 “hat” in the alternative model equals 
(3 “hat” in the original model. 
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Heaviside functions: 

• two HR’s constant within two 
time intervals 

• extension: several HR's constant 
within several time intervals 


We have thus seen that heaviside functions can 
be used to provide estimated hazard ratios that 
remain constant within each of two separate time 
intervals of follow-up. We can also extend the use 
of heaviside functions to provide several distinct 
hazard ratios that remain constant within several 
time intervals. 


Four time intervals: 

HR 


0 .5 1.0 1.5 t (years) 

Extended Cox model contains either 

• E, E x gi(t), E x g 2 {t), 

E x g 3 (t) 

or 

• Ex gi(t), E x g 2 (t), E x 
gi(t),E xg 4 (t) 


Suppose, for instance, that we wish to separate the 
data into four separate time intervals, and for each 
interval we wish to obtain a different hazard ratio 
estimate as illustrated in the graph shown here. 

We can obtain four different hazard ratios using 
an extended Cox model containing a main effect of 
exposure and three heaviside functions in the model 
as products with exposure. Or, we can use a model 
containing no main effect exposure term, but with 
product terms involving exposure with four heav¬ 
iside functions. 


| 1 | 2 | 3 | 4 To illustrate the latter model, suppose, as shown 

0 0.5 1.0 1.5 t (years) on the graph, that the first time interval goes from 

time 0 to 0.5 of a year; the second time interval 
goes from 0.5 to 1 year; the third time interval goes 
from 1 year to a year and a half; and the fourth 
time interval goes from a year and a half onward. 


h(t,X(t)) 

= h 0 (t) expfSjEg^t) + 8 2 Eg 2 (t) 
+6 3 £g 3 (f) + 5 4 £g 4 (t)] 

where 


gib) = 


g2 (t) = 


g3 (0 = 


g4 (t) = 


1 

0 

T 

0 

T 

0 

T 

0 


if 0 < t < 0.5 year 
if otherwise 

if 0.5 year < t < 1.0 year 
if otherwise 

if 1.0 year < t < 1.5 years 
if otherwise 

if t > 1.5 years 
if otherwise 


Then, an appropriate extended Cox model con¬ 
taining the four heaviside functions gi(t),g 2 (t), 
gi(t), andg 4 (t) is shown here. This model assumes 
that there are four different hazard ratios identi¬ 
fied by three cutpoints at half a year, one year, and 
one and a half years. The formulae for the four 
hazard ratios are given by separately exponenti¬ 
ating each of the four estimated coefficients, as 
shown below: 

0 < t < 0.5: HR^ = exp(Si) 

4 HR’s 0 5 ^ f < 10: ## = exp(S 2 ) 

1.0 < t <_1.5: HR = exp(S 3 ) 
t > 1.5: HR = exp(S 4 ) 
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VII. An Application of the 
Extended Cox Model to 
An Epidemiologic Study 
on the Treatment of 
Heroin Addiction 


EXAMPLE 


1991 Australian study (Caplehom 
et al.) of heroin addicts 

• two methadone treatment clinics 

• T = days remaining in treatment 
(= days until drop out of clinic) 

• clinics differ in treatment policies 

Dataset name: ADDICTS 
Column 1: Subject ID 
Column 2: Clinic (1 or 2) 

Column 3: Survival status (0 = cen¬ 
sored, 1 = departed clinic) 

Column 4: Survival time in days 
Column 5: Prison Record 

(0 = none, 1 = any) covariates 

Column 6: Maximum Methadone Dose 
(mg/day) 


h(t,TQ = h 0 (t) expfPj(clinic) 

+ (3 2 (prison) + (3 3 (dose)] 

Coef. Std. Err. p>|z| Haz. Ratio P(PH) 

Clinic -1.009 0.215 0.000 0.365 0.001 

Prison 0.327 0.167 0.051 1.386 0.332 

Dose -0.035 0.006 0.000 0.965 0.347 


P(PH) for the variables prison and dose are 
nonsignificant => remain in model 



A 1991 Australian study by Caplehorn et al., com¬ 
pared retention in two methadone treatment clin¬ 
ics for heroin addicts. A patient’s survival time 
(T) was determined as the time in days until the 
patient dropped out of the clinic or was cen¬ 
sored at the end of the study clinic. The two clin¬ 
ics differed according to their overall treatment 
policies. 

A listing of some of the variables in the dataset 
for this study is shown here. The dataset name is 
called “ADDICTS,” and survival analysis programs 
in the Stata package are used in the analysis. Note 
that the survival time variable is listed in column 
4 and the survival status variable, which indicates 
whether a patient departed from the clinic or was 
censored, is listed in column 3. The primary ex¬ 
posure variable of interest is the clinic variable, 
which is coded as 1 or 2. Two other variables of in¬ 
terest are prison record status, listed in column 5 
and coded as 0 if none and 1 if any, and maximum 
methadone dose, in milligrams per day, which is 
listed in column 6. These latter two variables are 
considered as covariates. 

One of the first models considered in the analysis 
of the addicts dataset was a Cox PH model con¬ 
taining the three variables, clinic, prison record, 
and dose. An edited printout of the results for this 
model is shown here. What stands out from this 
printout is that the P(PH) value for the clinic vari¬ 
able is zero to three significant places, which in¬ 
dicates that the clinic variable does not satisfy the 
proportional hazard assumption. 

Since the P(PH) values for the other two variables 
in the model are highly nonsignificant, this sug¬ 
gests that these two variables, namely, prison and 
dose, can remain in the model. 
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EXAMPLE (continued) 


Adjusted Survival Curves 



Results: 

• Curve for clinic 2 consistently lies above 
curve for clinic 1. 

• Curves diverge, with clinic 2 being vastly 
superior after one year. 


Stratifying by clinic: cannot obtain hazard 
ratio for clinic 

Hazard ratio for clinic requires clinic in the 
model. 


Extended Cox model: 
h(t,X(t)) = h 0 (t) exp[Pj( c linic) 

+ p 2 (prison) + p 3 (dose) 
+ S(clinic)g(f)] 


where 
g(0 = 
and 
clinic 


f 1 if t > 365 days 
to if t < 365 days 

_ f 1 if clinic 1 
I 0 if clinic 2 


Note: 

Previously 
clinic = 2 for 
clinic 2 


t > 365 days: HR = expfpj + 8) 
t < 365 days: HR = exp(pj) 


Further evidence of the PH assumption not be¬ 
ing satisfied for the clinic variable can be seen 
from a graph of adjusted survival curves strati¬ 
fied by clinic, where the prison and dose variables 
have been kept in the model. Notice that the two 
curves are much closer together at earlier times, 
roughly less than one year (i.e., 365 days), but the 
two curves diverge greatly after one year. This in¬ 
dicates that the hazard ratio for the clinic variable 
will be much closer to one at early times but quite 
different from one later on. 

The above graph, nevertheless, provides impor¬ 
tant results regarding the comparison of the two 
clinics. The curve for clinic 2 consistently lies 
above the curve for clinic 1, indicating that clinic 
2 does better than clinic 1 in retaining its patients 
in methadone treatment. Further, because the two 
curves diverge after about a year, it appears that 
clinic 2 is vastly superior to clinic 1 after one year 
but only slightly better than clinic 1 prior to one 
year. 

Unfortunately, because the clinic variable has been 
stratified in the analysis, we cannot use this anal¬ 
ysis to obtain a hazard ratio expression for the 
effect of clinic, adjusted for the effects of prison 
and dose. We can only obtain such an expression 
for the hazard ratio if the clinic variable is in the 
model. 

Nevertheless, we can obtain a hazard ratio us¬ 
ing an alternative analysis with an extended Cox 
model that contains a heaviside function, g(t), to¬ 
gether with the clinic variable, as shown here. 
Based on the graphical results shown earlier, a log¬ 
ical choice for the cutpoint of the heaviside func¬ 
tion is one year (i.e., 365 days). The corresponding 
model then provides two hazard ratios: one that is 
constant above 365 days and the other that is con¬ 
stant below 365 days. 

Note that in the extended Cox model here, we have 
coded the clinic variable as 1 if clinic 1 and 0 if 
clinic 2, whereas previously we had coded clinic 
2 as 2. The reason for this change in coding, as 
illustrated by computer output below, is to obtain 
hazard ratio estimates that are greater than unity. 
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EXAMPLE (continued) 


h(t,X(t)) = h 0 (t) exp[p 2 (prison) 

+ p 3 (dose) + §i(clinic)gj(0 
+ S 2 (clinic)g 2 (f)] 


where 


„ = f 1 if t < 365 days 

(0 iff > 365 days 

and 

, , f 1 if f > 365 days 
g2(0 = [o iff < 365 days 


f < 365 days: HR = expCSp 
f > 365 days: HR = expCSp 


Coef. 


Std. Haz. 

Err. p> Izl Ratio [95% Conf. Interval] 


Prison 0.378 0.168 0.025 1.459 1.049 2.029 

Dose -0.036 0.006 0.000 0.965 0.953 0.977 

Clinic xg, 0.460. 0.255 f0.072'| f1.583] 0.960 2.611 

Clinic xg 2 1.828 \).386 [o.QQoJ U.223j 2.921 13.259 



t < 365 days: e 0 - 460 = 1.583 

t > 365 days: HR = gi-«28 = 6.223 


95% confidence intervals for clinic effect: 
t < 365 days: (0.960,2.611) 
t > 365 days: (2.921, 13.259) 



Adjusted Survival Curves 

Clinic 2 



0 200 4{00 600 800 1000 1200 

Days 


(l year) 


An equivalent way to write the model is to use two 
heaviside functions, g i (f) and g 2 (t ), as shown here. 
This latter model contains product terms involv¬ 
ing clinic with each heaviside function, and there 
is no main effect of clinic. 

Corresponding to the above model, the effect of 
clinic is described by two hazard ratios, one for 
time less than 365 days and the other for greater 
than 365 days. These hazard ratios are obtained by 
separately exponentiating the coefficients of each 
product term, yielding e to the 61 “hat” and e to 
the 62 “hat,” respectively. 

A printout of results using the above model with 
two heaviside functions is provided here. The re¬ 
sults show a borderline nonsignificant hazard ra¬ 
tio (P = 0.072) of 1.6 for the effect of clinic when 
time is less than 365 days in contrast to a highly 
significant (P = 0.000 to three decimal places) haz¬ 
ard ratio of 6.2 when time exceeds 365 days. 

Note that the estimated hazard ratio of 1.583 from 
the printout is computed by exponentiating the 
estimated coefficient 0.460 of the product term 
“clinic x gi” and that the estimated hazard ra¬ 
tio of 6.223 is computed by exponentiating the 
estimated coefficient 1.828 of the product term 
“clinic x g2”. 

Note also that the 95% confidence interval for the 
clinic effect prior to 365 days—that is, for the prod¬ 
uct term “clinic x gi(t )”—is given by the limits 
0.960 and 2.611, whereas the corresponding confi¬ 
dence interval after 365 days—that is, for the prod¬ 
uct term “clinic x g 2 ” —is given by the limits 2.921 
and 13.259. The latter interval is quite wide, show¬ 
ing a lack of precision when t exceeds 365 days; 
however, when t precedes 365 days, the interval 
includes the null hazard ratio of 1 , suggesting a 
chance effect for this time period. 

The results we have just shown support the obser¬ 
vations obtained from the graph of adjusted sur¬ 
vival curves. That is, these results suggest a large 
difference in clinic survival times after one year 
in contrast to a small difference in clinic survival 
times prior to one year, with clinic 2 always doing 
better than clinic 1 at any time. 
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EXAMPLE (continued) 


One other analysis: 

Use an extended Cox model that 
provides for diverging survival curves 


h(t,X{t)) = /! 0 (f)exp[p[(clinic) 

+ (3 2 (p r ison) + p 3 (dose) 
_ + 8(clinic x t)] _ 

HR = expfp! + St) 


HR changes over time. 

t = 91 days 

h(t,X(t)) = feoCdexptPTclinic) 

+ P 2 (prison) + P 3 (dose) 
+ 8(clinic)(91)] 

So 

HR = exp(Pi + 918) 
t = 274: 

h{t,X{t)) = fcQCpexpfpTclinic) 

+ P 2 (prison) + P 3 (dose) 
+ 8(clinic)(274)] 

HR = exp(p] + 274§) 
f = 458.5: 

HR = exp(pi + 458.5S) 
t = 639: 

HR = exp(pi + 639S) 
t = 821.5: 

HR = exp(^ + 821.5§) 


S > 0 => HR as time 


There is, nevertheless, at least one other approach 
to the analysis using time-dependent variables 
that we now describe. This approach considers 
our earlier graphical observation that the survival 
curves for each clinic continue to diverge from 
one another even after one year. In other words, it 
is reasonable to consider an extended Cox model 
that allows for such a divergence, rather than a 
model that assumes the hazard ratios are constant 
before and after one year. 

One way to define an extended Cox model that pro¬ 
vides for diverging survival curves is shown here. 
This model includes, in addition to the clinic vari¬ 
able by itself, a time-dependent variable defined 
as the product of the clinic variable with time (i.e. 
clinic x t). By including this product term, we 
are able to estimate the effect of clinic on survival 
time, and thus the hazard ratio, for any specified 
time t. 

To demonstrate how the hazard ratio changes over 
time for this model, we consider what the model 
and corresponding estimated hazard ratio expres¬ 
sion are for different specified values of t. 

For example, if we are interested in the effect of 
clinic on survival on day 91, so that t = 91, the 
exponential part of the model simplifies to terms 
for the prison and dose variables plus (3! times 
the clinic variable plus 6 times the clinic variable 
times 91: the corresponding estimated hazard ra¬ 
tio for the clinic effect is then e to the power (3, 
“hat” plus 6 “hat” times t = 91. 

At 274 days, the exponential part of the model con¬ 
tains the prison, dose, and clinic main effect terms 
as before, plus 6 times the clinic variable times 
274: the corresponding hazard ratio for the clinic 
effect is then e to pj “hat” plus 274 6 “hat”. 

The formulae for the estimated hazard ratio for 
other specified days are shown here. Notice that 
the estimated hazard ratio appears to be increase 
over the length of the follow-up period. Thus, if 
6 “hat” is a positive number, then the estimated 
hazard ratios will increase over time. 
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EXAMPLE (continued) 


Computer results for extended Cox 
model involving T(t): 


Coef. 

Std. 

Err. 

P>lzl 

Haz. 

Ratio 

[95% Conf. Interval] 

prison 0.390 

0.169 

0.021 

1.476 

1.060 

2.056 

dose -0.035 

0.006 

0.000 

0.965 

0.953 

0.978 

clinic £-0.0183) 

0.347 

0.958 

0.982 

0.497 

1.939 

clinic x t £ Q.QQ3) 

0.001 

0.001 

( 1.003 ) 

1.001 

1.005 


cov (P|, 8) = -.000259 Log likelihood = -667.642 


ft = -0.0183 


8 = 0.003 


HR depends on (Sj and 8. 

f = 91.5: HR = exp((S] + 8f) = 1.292 
t = 274: HR = exp(^ + 8 1) = 2.233 
t = 458.5: HR = exp(^ + 8f) = 3.862 
t = 639: HR = exp(^ + 8f) = 6.677 
f = 821.5: HR = exp(^ + 8f) = 11.544 


exp[^ + 8t ± 1.96^ Var(^j + 8t) J 


Varfi^ + 8t) = s? + t 2 sg + 2? cov(|\, 8) 

t , T T 

(0.347) 2 (0.001) 2 (-.000259) 


Time (days) 

HR 

95% Cl 

91.5 

1.292 

(0.741, 2.250) 

274 

2.233 

(1.470, 3.391) 

458.5 

3.862 

(2.298, 6.491) 

639 

6.677 

(3.102, 14.372) 

821.5 

11.544 

(3.976, 33.513) 


We now show edited results obtained from fitting 
the extended Cox model we have just been de¬ 
scribing, which contains the product of clinic with 
time. The covariance estimate shown at the bot¬ 
tom of the table will be used below to compute 
confidence intervals. 

From these results, the estimated coefficient of the 
clinic variable is p j “hat” equals -0.0183, and the 
estimated coefficient 6 “hat” obtained for the prod¬ 
uct term equals 0.003. For the model being fit, 
the hazard ratio depends on the values of both (3, 
“hat” and 6 “hat.” 

On the left, the effect of the variable clinic is de¬ 
scribed by five increasing hazard ratio estimates 
corresponding to each of five different values of t. 
These values, which range between 1.292 at 
91.5 days to 11.544 at 821.5 days, indicate how 
the effect of clinic diverges over time for the fitted 
model. 

We can also obtain 95% confidence intervals for 
each of these hazard ratios using the large sam¬ 
ple formula shown here. The variance expression 
in the formula is computed using the variances 
and covariances which can be obtained from the 
computer results given above. In particular, the 
variances are (0.347) 2 and (0.001) 2 for (3j “hat” 
and 6 “hat,” respectively; the covariance value is 
-0.000259. 

A table showing the estimated hazard ratios and 
their corresponding 95% confidence intervals for 
the clinic effect is given here. Note that all confi¬ 
dence intervals are quite wide. 
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VIII. An Application of the 
Extended Cox Model to 
the Analysis of the 
Stanford Heart 
Transplant Data 


EXAMPLE 


Patients identified as eligible for heart 
transplant: 

T = time until death or censorship 
65 patients receive transplants 
38 patients do not receive transplants 
ft = 103 patients 

Goal: Do patients receiving transplants 
survive longer than patients not receiv¬ 
ing transplants? 


One approach: 

Compare two separate groups: 65 trans¬ 
plants vs. 38 nontransplants 


Problem: 


Wait-time 


Eligibility 


Censored 
or death 
-¥- 


Received 
transplant 
- Total survival time - 


Time—> 


Note: Wait-time contributes to survival 
time for nontransplants. 


Covariates: 

Tissue mismatch score 
Age at transplant 


1 prognostic only 
J for transplants 


Age at eligibility: not considered prog¬ 
nostic for nontransplants 


We now consider another application of the ex¬ 
tended Cox model which involves the use of an 
internally defined time-dependent variable. In a 
1977 report (Crowley and Hu, J. Amer. Statist. 
Assoc.) on the Stanford Heart Transplant Study, 
patients identified as being eligible for a heart 
transplant were followed until death or censor¬ 
ship. Sixty-five of these patients received trans¬ 
plants at some point during follow-up, whereas 
thirty-eight patients did not receive a transplant. 
There were, thus, a total of n = 103 patients. The 
goal of the study was to assess whether patients re¬ 
ceiving transplants survived longer than patients 
not receiving transplants. 

One approach to the analysis of this data was 
to separate the dataset into two separate groups, 
namely, the 65 heart transplant patients and the 
38 patients not receiving transplants, and then to 
compare survival times for these groups. 

A problem with this approach, however, is that 
those patients who received transplants had to 
wait from the time they were identified as eligible 
for a transplant until a suitable transplant donor 
was found. During this “wait-time” period, they 
were at risk for dying, yet they did not have the 
transplant. Thus, the wait-time accrued by trans¬ 
plant patients contributes information about the 
survival of nontransplant patients. Yet, this wait¬ 
time information would be ignored if the total 
survival time for each patient were used in the 
analysis. 

Another problem with this approach is that two 
covariates of interest, namely, tissue mismatch 
score and age at transplant, were considered as 
prognostic indicators of survival only for patients 
who received transplants. Note that age at eligi¬ 
bility was not considered an important prognostic 
factor for the nontransplant group. 
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EXAMPLE (continued) 


Problems: 

• wait-time of transplant recipients 

• prognostic factors for transplants 
only 

Alternative approach: 

Uses an extended Cox model 

Exposure variable: 

Heart transplant status at time t, 
defined as 

0 if did not receive transplant 
by time t, i.e., if t < wait- 

HT(t) = time 

1 if received transplant prior 

to time t, i.e., if t > wait¬ 
time 


No transplant 

TO 


Transplant 

TO 


HT(t) 

0000...00000 
r-» 


HT(t ) 


Time of transplant 

Wait-time for transplants contributes to survival 
for nontransplants. 


In addition to HT(t), two time-dependent 
covariates included in model. 


Because of the problems just described, which 
concern the wait-time of transplants and the ef¬ 
fects of prognostic factors attributable to trans¬ 
plants only, an alternative approach to the analysis 
is recommended. This alternative involves the use 
of time-dependent variables in an extended Cox 
model. 

The exposure variable of interest in this extended 
Cox model is heart transplant status at time t, de¬ 
noted by HT{t). This variable is defined to take 
on the value 0 at time t if the patient has not 
received a transplant at this time, that is, if t is 
less than the wait-time for receiving a transplant. 
The value of this variable is 1 at time t if the 
patient has received a transplant prior to or at 
time t, that is, if t is equal to or greater than the 
wait-time. 

Thus, for a patient who did not receive a transplant 
during the study, the value of HT(t) is 0 at all times. 
For a patient receiving a transplant, the value of 
HT(t ) is 0 at the start of eligibility and continues 
to be 0 until the time at which the patient receives 
the transplant; then, the value of HT(t) changes 
to 1 and remains 1 throughout the remainder of 
follow-up. 

Note that the variable HT{t) has the property that 
the wait-time for transplant patients contributes 
to the survival experience of nontransplant pa¬ 
tients. In other words, this variable treats a trans¬ 
plant patient as a nontransplant patient prior to 
receiving the transplant. 

In addition to the exposure variable HT{t), two 
other time-dependent variables are included in 
our extended Cox model for the transplant data. 
These variables are covariates to be adjusted for 
in the assessment of the effect of the HT(t) vari¬ 
able. 
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EXAMPLE (continued) 


Covariates: 

_ f 0 if t < wait-time 
l TMS if t > wait-time 
AGE{t) = fO if t < wait-time 

l AGE if t > wait-time 


h(t,X(t)) = h Q (t) exp[8jf7r(f) 

+ 5 2 TMS(t) + 8 3 AG£(f)] 


Focus: 

Assessing the effect of HT(t) adjusted 
for TMS(t) and AGE{t). 

Note: HT(t) does not satisfy PH 
assumption. 


Variable 


Coef. 


Std. 

Err. 


P>lzl 


Haz. 

Ratio 


HT(t) 
TMS(t) 
AGE(t ) 


-3.1718 

0.4442 

0.0552 


1.1861 

0.2802 

0.0226 


(0.008) (0.0417) 

0.112 1.5593 

0.014 1.0567 


HR = e -3.l7i8 = 0.0417 = 


1 


23.98 


fe(transplants) 1 2 
ft(nontransplants) ^4 
Not appropriate! 


These covariates are denoted as TMS(t) and 
AGE(t) and they are defined as follows: TMS(t) 
equals 0 if t is less than the wait-time for a trans¬ 
plant but changes to the “tissue mismatch score” 
(TMS) at the time of the transplant if t is equal 
to or greater than the wait-time. Similarly, AGE(t) 
equals 0 if t is less than the wait-time but changes 
to AGE at time of transplant if t is equal to or 
greater than the wait-time. 

The extended Cox model for the transplant data is 
shown here. The model contains the three time- 
dependent variables ElT(t), TMS(t) and AGE(t) as 
described above. 

For this model, since HT(t) is the exposure vari¬ 
able of interest, the focus of the analysis concerns 
assessing the effect of this variable adjusted for 
the two covariates. Note, however, that because 
the HT(t) variable is time-dependent by definition, 
this variable does not satisfy the PH assumption, 
so that any hazard ratio estimate obtained for this 
variable is technically time-dependent. 


A summary of computer results for the fit of the 
above extended Cox model is shown here. These 
results indicate that the exposure variable HT(t) is 
significant below the one percent significance level 
(i.e., the two-sided p-value is 0.008). Thus, trans¬ 
plant status appears to be significantly associated 
with survival. 

To evaluate the strength of the association, note 
that e to the coefficient of HT(t) equals 0.0417. 
Since 1 over 0.0417 is 23.98, it appears that there is 
a 24-fold increase in the hazard of nontransplant 
patients to transplant patients. The preceding in¬ 
terpretation of the value 0.0417 as a hazard ratio 
estimate is not appropriate, however, as we shall 
now discuss further. 
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EXAMPLE (continued) 


23.98 is inappropriate as a HR: 

• does not compare two separate 
groups 

• exposure variable is not time- 
independent 

• wait-time on transplants contributes 
to survival on nontransplants 


Alternative interpretation: 

At time t, 

h(“not yet received transplant") 

= 24 h (“already received transplant”) 


More appropriate: 

Hazard ratio formula should account 
for TMS and AGE. 

Transplant? HT(t) TMS{t) AGE(t) 


Yes 

No 


TMS 

0 


AGE 

0 


i denotes ith transplant patient 


X‘(0 = (HT(t) = 1, TMS(t) = TMSi, AGE(t)=AGEj) 
X(f) = (HT(t) = 0, TMS(t) = 0, AGEit) = 0) 


HR(t) = expfSjO - 0) + 8 2 (rMS, - 0) 
+ 8 3 (AG£, - 0)] 

= exp[8j + 8 2 JMS, + 5, AGE,-] 

= exp[-3.1718 + 0.4442 TMS[ 

+ 0.0552 AGE,-] 


First, note that the value of 23.98 inappropri¬ 
ately suggests that the hazard ratio is compar¬ 
ing two separate groups of patients. However, the 
exposure variable in this analysis is not a time- 
independent variable that distinguishes between 
two separate groups. In contrast, the exposure 
variable is time-dependent, and uses the wait-time 
information on transplants as contributing to the 
survival experience of non-transplants. 

Since the exposure variable is time-dependent, an 
alternative interpretation of the hazard ratio esti¬ 
mate is that, at any given time t, the hazard for a 
person who has not yet received a transplant (but 
may receive one later) is approximately 24 times 
the hazard for a person who already has received a 
transplant by that time. 

Actually, we suggest that a more appropriate haz¬ 
ard ratio expression is required to account for 
a transplant's TMS and AGE score. Such an ex¬ 
pression would compare, at time t, the values of 
each of the three time-dependent variables in the 
model. For a person who received a transplant, 
these values are 1 for HT(t) and TMS and AGE for 
the two covariates. For a person who has not re¬ 
ceived a transplant, the values of all three variables 
are 0 . 

Using this approach to compute the hazard ratio, 
the X*(f) vector, which specifies the predictors for 
a patient i who received a transplant at time t, has 
the values 1, IMS, and ,4G£, for patient i; the X(t) 
vector, which specifies the predictors at time t for 
a patient who has not received a transplant at time 
t, has values of 0 for all three predictors. 

The hazard ratio formula then reduces to e to the 
sum of 5i “hat” plus 62 “hat” times TMSi plus 63 
“hat” times AGEj, where the 6 “hat's” are the es¬ 
timated coefficients of the three time-dependent 
variables. Substituting the numerical values for 
these coefficients in the formula gives the expo¬ 
nential expression circled here. 
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EXAMPLE (continued) 


HR(t) is time-dependent, i.e., its value at 
time t depends on TMS and AGE t at 
time t 

TMS range: (0-3.05) 

AGE range: (12-64) 


The resulting formula for the hazard ratio is time- 
dependent in that its value depends on the TMS 
and AGE values of the zth patient at the time of 
transplant. That is, different patients can have dif¬ 
ferent values for TMS and AGE at time of trans¬ 
plant. Note that in the dataset, TMS ranged be¬ 
tween 0 and 3.05 and AGE ranged between 12 and 
64. 


We end our discussion of the Stanford Heart 
Transplant Study at this point. For further insight 
into the analysis of this dataset, we refer the reader 
to the 1977 paper by Crowley and Hu (/. Amer. 
Statist. Assoc.). 


IX. The Extended Cox 
Likelihood 


ID 

TIME 

STATUS 

SMOKE 

Barry 

2 

1 

1 

Gary 

3 

1 

0 

Harry 

5 

0 

0 

Larry 

8 

1 

1 


SURVT = Survival time (in years) 
STATUS = 1 for event, 0 for 
censorship 

SMOKE = 1 for a smoker, 0 for a 
nonsmoker 


At the end of the presentation from Chapter 3 (Sec¬ 
tion VIII), we illustrated the Cox likelihood using 
the dataset shown on the left. In this section we 
extend that discussion to illustrate the Cox likeli¬ 
hood with a time-dependent variable. 

To review: The data indicate that Barry got the 
event at TIME = 2 years. Gary got the event at 
3 years, Harry was censored at 5 years, and Larry 
got the event at 8 years. Furthermore, Barry and 
Larry were smokers whereas Gary and Harry were 
nonsmokers. 


Cox PH model: h(t ) = h 0 (t)e P' SM0KE 

Cox PH Likelihood 
l = 

ho(t)e& 1 

_hg(t)e Pi + lia(t)e° + hg{t)e 0 + hg{t)e Pi _ 

_ h 0 (t)e° _ 

ho(t)e° + ho(t)e° + ho(t)e&t 

hp(t)e Pi ' 

_ho(t)e Pi _ 


In Chapter 3 we constructed the Cox likelihood 
with one predictor SMOKE in the model. The 
model and the likelihood are shown on the left. 
The likelihood is a product of three terms, one 
term for each event time tj (TIME = 2, 3, and 8). 
The denominator of each term is the sum of the 
hazards from the subjects still in the risk set at 
time tj, including the censored subject Harry. The 
numerator of each term is the hazard of the sub¬ 
ject who got the event at tj. The reader may wish 
to reread Section VIII of Chapter 3. 


X 
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Cox extended model 


h(t) = h 0 (t)e 


Pi SMOKE + |3 2 SMOKE x TIME 

/ 


Time-dependent covariate 
(its value changes over time) 


Now consider an extended Cox model, which con¬ 
tains the predictor SMOKE, and a time-dependent 
variable SMOKE x TIME. For this model it is not 
only the baseline hazard that may change over 
time but also the value of the predictor variables. 
This can be illustrated by examining Larry’s haz¬ 
ard at each event time. 


Larry got the event at TIME = 8 
Larry's hazard at each event time 


TIME 

Larry’s Hazard 

2 

ho(t)e li,+2 ^ 2 

3 

h 0 (t)e< il+3li2 

8 

ho(t)eP ,+8liz 

Cox extended model 

L = 

r 

^ 0 (r)e fi i +2|3 2 


i 0 (t)e Pi +2 fe + h 0 (t)e° + h 0 (t)e° + h 0 (t)e p i +2|3 2 J 

[_ h 0 (t)e° _I A 

Ui 0 (t)e« + h 0 (t)e» + ho(t)el i i+ 3 l i 2 J / 

)*r 


r /i 0 (p g Pi+ s P2 -| 
' lh a (t)e Pi+ s P2 


' / 
' / 

' / 


Likelihood is product'pf i terms: 
L = L[ x L 2 x L 3 \ I / 

/ t 

Barry Gary Larry 

(t =2) (t = 3) (t = 8 ) 


Larry, a smoker, got the event at TIME = 8 . 
However at TIME = 2, 3, and 8 , the covariate 
SMOKE x TIME changes values, thus affecting 
Larry’s hazard at each event time (see left). Un¬ 
derstanding how the expression for an individual’s 
hazard changes over time is the key addition to¬ 
ward understanding how the Cox extended likeli¬ 
hood differs from the Cox PH likelihood. 


The likelihood for the extended Cox model is con¬ 
structed in a similar manner to that of the likeli¬ 
hood for the Cox PH model. The difference is that 
the expression for the subject’s hazard is allowed 
to vary over time. The extended Cox likelihood for 
these data is shown on the left. 

Just as with the Cox PH likelihood shown previ¬ 
ously, the extended Cox likelihood is also a product 
of three terms, corresponding to the three event 
times (L = Li x L 2 x L 3 ). Barry got the event 
first at t = 2, then Gary at l = 3, and finally Larry 
at t = 8 . Harry, who was censored at t = 5, was 
still at risk when Barry and Gary got the event. 
Therefore, Harry's hazard is still in the denomina¬ 
tor of Li and L?. 


SMOKE x TIME = 0 for nonsmok¬ 
ers 

SMOKE x TIME changes over time 
for smokers 

Larry’s hazard changes over Li, L 2 , 

1-3- 


The inclusion of the time-varying covariate 
SMOKE x TIME does not change the expres¬ 
sion for the hazard for the nonsmokers (Gary and 
Harry) because SMOKE is coded 0 for nonsmok¬ 
ers. However, for smokers (Barry and Larry), the 
expression for the hazard changes with time. No¬ 
tice how Larry's hazard changes in the denomina¬ 
tor of Li, L 2 and L 3 (see dashed arrows above). 
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ho(t) cancels in L 

gPi+2|3 2 

T = ___ 

g Pi+2Pz -J- gO -|- gO C Pi"^^ P 2 

r e° 

gO _j_ gO +e p,+3p 2 
"g Pi+8p 2 ■ 

X gPi+8p 2 


Incorrent coding of SMOKE x TIME 



not time-dependent 


Incorrectly coded SMOKE x TIME 

• Time independent 

• Probably highly significant 

• Survival time should predict 
survival time 

• But not meaningful 


Correctly coding SMOKE x TIME 

• Time dependent 

• Computer packages allow 
definition in the analytic 
procedure 

• See Computer Appendix for 
details 


The baseline hazard cancels in the extended Cox 
likelihood as it does with the Cox PH likelihood. 
Thus, the form of the baseline hazard need not be 
specified, as it plays no role in the estimation of 
the regression parameters. 


A word of caution for those planning to run a 
model with a time-varying covariate: it is incor¬ 
rect to create a product term with TIME in the 
data step by multiplying each individual’s value 
for SMOKE with his survival time. In other words, 
SMOKE x TIME should not be coded like the typ¬ 
ical interaction term. In fact, if SMOKE x TIME 
were coded as it is on the left, then SMOKE x 
TIME would be a time-independent variable. 
Larry's value for SMOKE x TIME is incorrectly 
coded at a constant value of 8 even though Larry's 
value for SMOKE x TIME changes in the likeli¬ 
hood over Li, L 2 , and L 3 . 

If the incorrectly coded time-independent 
SMOKE x TIME were included in a Cox model it 
would not be surprising if the coefficient estimate 
were highly significant even if the PH assumption 
were not violated. It would be expected that a 
product term with each individual’s survival time 
would predict the outcome (his survival time), 
but it would not be meaningful. Nevertheless, this 
is a common mistake. 

To obtain a correctly defined SMOKE x TIME 
time-dependent variable, computer packages typ¬ 
ically allow the variable to be defined within the 
analytic procedure. See Computer Appendix to 
see how time-dependent variables are defined in 
Stata, SAS, and SPSS. 
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Coding SMOKE x TIME as time- 
dependent 

Multiple Observations per Subject 


SMOKE 


ID 

TIME 

STATUS 

SMOKE 

x TIME 

Barry 

2 

1 

1 

2 

Gary 

2 

0 

0 

0 

Gary 

3 

1 

0 

0 

Harry 

2 

0 

0 

0 

Harry 

3 

0 

0 

0 

Harry 

5 

0 

0 

0 

Larry 

2 

0 

1 

2 

Larry 

3 

0 

1 

3 

Larry 

5 

0 

1 

5 

Larry 

8 

1 

1 

8 


t 


Coded as time-dependent 


Multiple observations per subject: 
revisited in Chapter 8 (recurrent 
events) 


When a time-dependent variable is defined within 
the Cox analytic procedure, the variable is defined 
internally such that the user may not see the time- 
dependent variable in the dataset. However, the 
dataset on the left will provide a clearer idea of the 
correct definition of SMOKE x TIME. The dataset 
contains multiple observations per subject. Barry 
was at risk at t = 2 and got the event at that time. 
Gary was at risk at t = 2 and t = 3. Gary didn't get 
the event at t = 2 but did get the event at t = 3. 
Harry was at risk att = 2,t = 3,t = 5 and didn't 
get the event. Larry was at risk at t = 2, t = 3, 
t = 5, t = 8 and got the event at t = 8. Notice how 
the SMOKE x TIME variable changes values for 
Larry over time. 


Survival analysis datasets containing multiple ob¬ 
servations per subject are further discussed in 
Chapter 8 on recurrent events. With recurrent 
event data, subjects may remain at risk for sub¬ 
sequent events after getting an event. 


X. Summary 

Review Cox PH model. 

Define time-dependent variable: 
defined, internal, ancillary. 


A summary of this presentation on time- 
dependent variables is now provided. We began by 
reviewing the main features of the Cox PH model. 
We then defined a time-dependent variable and il¬ 
lustrated three types of these variables—defined, 
internal, and ancillary. 


Extended Cox model: 


h(t,X(t)) = h 0 (t)exp 


pi 

E M 


Li=l 


+ E 


7=1 


Next, we gave the form of the “extended Cox 
model,” shown here again, which allows for time- 
dependent as well as time-independent variables. 


HR(t) = exp 




1=1 


Pi 


+ £ s ;F/(')-*/(')]] 

7=1 


We then described various characteristics of this 
extended Cox model, including the formula for the 
hazard ratio. The latter formula is time-dependent 
so that the PH assumption is not satisfied. 


Function of time 
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Model for assessing PH We also showed how to use time-dependent vari- 

assumption: ables to assess the PH assumption for time- 

independent variables. A general formula for an 
extended Cox model that simultaneously consid¬ 
ers all time-independent variables of interest is 
shown here. 


h(t,X(t)) = /z 0 (f)exp 


E 13,-X/ 

L;=i 


- E &iXigi(t) 

i =1 


Examples of g,(t): 
t, log t, heaviside function 


The functions g,(t) denote functions of time for 
the ith variable that are to be determined by the in¬ 
vestigator. Examples of such functions are g;(f) = 
t, log t, or a heaviside function. 


Heaviside functions: The use of heaviside functions were described and 

^ illustrated. Such functions allow for the hazard 

ratio to be constant within different time intervals. 


t 


h(t,X(t)) = ho(t) exp[|3£ + SEg(t)] 


where 



if t > t 0 
if t < t 0 


h(t,X(t)) 

= h 0 (t)exp[|3 1 Egi(t)+ (3 2 Eg 2 (t)] 


For two time intervals, the model can take either 
one of two equivalent forms as shown here. The 
first model contains a main effect of exposure and 
only one heaviside function. The second model 
contains two heaviside functions without a main 
effect of exposure. Both models yield two distinct 
and equivalent values for the hazard ratio. 


where 
gi(0 = 


1 if t > to 
0 if t < to 



if t < t 0 
if t > t 0 


EXAMPLE 1 


1991 Australian study of heroin addicts 

• two methadone maintenance clinics 

• addicts dataset file 

• clnic variable did not satisfy PH 
assumption 


We illustrated the use of time-dependent variables 
through two examples. The first example consid¬ 
ered the comparison of two methadone mainte¬ 
nance clinics for heroin addicts. The dataset file 
was called addicts. In this example, the clinic vari¬ 
able, which was a dichotomous exposure variable, 
did not satisfy the PH assumption. 
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EXAMPLE (continued) 


Adjusted Survival Curves 
Stratified by Clinic 

Clinic 2 



200 400 600 800 1000 1200 
Days 


h(t,X(t)) = h 0 (t) exp[p 2 (prison) 

+ p 3 (dose) + SjCcliniclg^f) 
+ S 2 (clinic)g 2 (f)] 

/ _Z 

(Heaviside functions') 

h(t,X(t)) = / 2 0 (?)exp[p 2 (prison) 

+ p 3 (dose) + p! (clinic) 

+ 8(clinic x t)] 

where 

T(t) 1, 3, 5, 7, 9 in half-year intervals 


Adjusted survival curves stratified by clinic 
showed clinic 2 to have consistently higher sur¬ 
vival probabilities than clinic 1, with a more pro¬ 
nounced difference in clinics after one year of 
follow-up. However, this stratification did not al¬ 
low us to obtain a hazard ratio estimate for clinic. 
Such an estimate was possible using an extended 
Cox model containing interaction terms involving 
clinic with time. 


Two extended Cox models were considered. The 
first used heaviside functions to obtain two dis¬ 
tinct hazard ratios, one for the first year of follow¬ 
up and the other for greater than one year of 
follow-up. The model is shown here. 

The second extended Cox model used a time- 
dependent variable that allowed for the two sur¬ 
vival curves to diverge over time. This model is 
shown here. 

Both models yielded hazard ratio estimates that 
agreed reasonably well with the graph of adjusted 
survival curves stratified by clinic. 


EXAMPLE 2: Stanford Heart 
Transplant Study 


Goals: Do patients receiving transplants 
survive longer than patients not receiv¬ 
ing transplants? 


h(t,X(t)) = h 0 (t) expfSjHTXt) + 8 2 TMS(t) 
+ 8 3 AG£(f)] 


Exposure variable 


The second example considered results obtained 
in the Stanford Heart Transplant Study. The goal 
of the study was to assess whether patients receiv¬ 
ing transplants survived longer than patients not 
receiving transplants. 

The analysis of these data involved an extended 
Cox model containing three time-dependent vari¬ 
ables. One of these, the exposure variable, and 
called HT(t), was an indicator of transplant sta¬ 
tus at time t. The other two variables, TMS(t) and 
AGE{t), gave tissue mismatch scores and age for 
transplant patients when time t occurred after re¬ 
ceiving a transplant. The value of each of these 
variables was 0 at times prior to receiving a trans¬ 
plant. 
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EXAMPLE (continued) 


Results: HT{t) highly significant, i.e., 
transplants have better prognosis than 
nontransplants. 

Hazard ratio estimate problematic: 


More appropriate formula: 

HR = exp[ -3.1718 + 0.4442 TMS t 
+ 0.0552 AGE,-] 


The results from fitting the above extended Cox 
model yielded a highly significant effect of the ex¬ 
posure variable, thus indicating that survival prog¬ 
nosis was better for transplants than for nontrans¬ 
plants. 

From these data, we first presented an inappropri¬ 
ate formula for the estimated hazard ratio. This 
formula used the exponential of the coefficient of 
the exposure variable, which gave an estimate of 
1 over 23.98. A more appropriate formula con¬ 
sidered the values of the covariates TMS(t) and 
AGE{t) at time t. Using the latter, the hazard ratio 
estimate varied with the tissue mismatch scores 
and age of each transplant patient. 


Chapters 


1. Introduction to Survival 
Analysis 

2. Kaplan-Meier Curves and the 
Log-Rank Test 

3. The Cox Proportional Hazards 
Model 


4. Evaluating the Proportional 
Hazards Assumption 

5. The Stratified Cox Procedure 




o. Extension of the Cox 

Proportional Hazards Model 
v for Time-Dependent Variables^ 


This presentation is now complete. We suggest 
that the reader review the detailed outline that fol¬ 
lows and then answer the practice exercises and 
test that follow the outline. 

A key property of Cox models is that the distri¬ 
bution of the outcome, survival time, is unspec¬ 
ified. In the next chapter, parametric models are 
presented in which the underlying distribution of 
the outcome is specified. The exponential, Weibull, 
and log-logistic models are examples of paramet¬ 
ric models. 


Next: 


7. Parametric models 
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Detailed 

Outline 



C. The meaning of the PH assumption: 

• Hazard ratio formula shows that the hazard 
ratio is independent of time: 

h(t,X) 

• Hazard ratio for two I’s are proportional: 
h(t,X*) = Qh(t,X) 

D. Three methods for checking the PH assumption: 

i. Graphical: Compare ln-ln survival curves or 
observed versus predicted curves 

ii. Time-dependent covariates: Use product (i.e., 
interaction) terms of the form X x g(l ). 

iii. Goodness-of-fit test: Use a large sample Z 
statistic. 

E. Options when the PH assumption is not met: 

i. Use a stratified Cox procedure. 

ii. Use an extended Cox model containing a 
time-dependent variable of the form X x g(t). 

III. Definition and Examples of Time-Dependent 
Variables (pages 216-219) 

A. Definition: any variable whose values differ over 
time 

B. Examples of defined, internal, and ancillary 
time-dependent variables 


I. Preview (page 214) 

II. Review of the Cox PH Model (pages 214-216) 
A. The formula for the Cox PH model: 


h(t,X) = h 0 (t)ex p 


E P iXi 

U=i 


B. Formula for hazard ratio comparing two 
individuals: 


X* = (X*, X* .... X*) and X = (X u X 2 ,..., X„) 
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IV. The Extended Cox Model for Time-Dependent 
Varibles (pages 219-221) 


A. 


h(t,X(t)) = ho(t ) exp 


E + E 8jXj(t) 


i =1 ;=1 


where X(t) = (Xi, X 2 , . . ., X Pl , Xi(f), X 2 (t), ..., 
X Vl {t)) denotes the entire collection of predictors 
at time t, X,- denotes the zth time-independent 
variable, and X,(i) denotes the/th time-dependent 
variable. 

B. ML procedure used to estimate regression 
coefficients. 

C. List of computer programs for the extended Cox 
model. 

D. Model assumes that the hazard at time t depends 
on the value of Xj(t) at the same time. 

E. Can modify model for lag-time effect. 

V. The Hazard Ratio Formula for the Extended Cox 

Model (pages 221-223) 


A. 


HR(t) = exp 


E iME 




+ E &j[x*(t) - Xj(t)\ 


B. Because HR(t) is a function of time, the PH 
assumption is not satisfied. 

C. The estimated coefficient of X ; - (t) is 
time-independent, and represents an “overall” 
effect ofXy(t). 

VI. Assessing Time-Independent Variables That Do 
Not Satisfy the PH Assumption (pages 224-229) 

A. General formula for assessing PH assumption: 


/z(f,X(i)) = h 0 (t)exp 


E PiX( + E &iXigi(t) 


Li=l i=l 


B. gi(t) is a function of time corresponding to X; 

C. Test Hq\ = 6 2 = ... = 6 p = 0 

D. Heaviside function: 



if t > t 0 
if t < t 0 
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6 . 


Extension of the Cox Proportional Hazards Model 


E. The model with a single heaviside function: 
h(t,X(t)) = /z 0 (f)exp[|3£ + 5 £g(t)] 

F. The model with two heaviside functions: 
Ut,X(t)) = /t 0 (f)exp[5 1 £g 1 (f) + 6 2 £g 2 (f)] 
where 


gi(t) 


1 iff > to 
0 iff < to 


and s2(o = (J 


G. The hazard ratios: 

f > to: HR = exp((3 + 6) = exp(Si) 
t < to'- HR = exp(|3) = exp(§ 2 ) 


H. Several heaviside functions: examples given with 
four time-intervals: 

• Extended Cox model contains either 
{£, £ x g!(t), £ x g 2 (f), £ x g 3 (f)} or 

{£ x gi(f), £ X g 2 (f), £ X g 3 (f), £ X g 4 (t)} 

• The model using four product terms and no 
main effect of £: 


Mf,X(f)) = /z 0 (t)exp[6 1 £g,(t) + 6 2 £g 2 (t) 

+ 6 3 £g 3 (f) + 6 4 £g 4 (f)] 

where 


{ 1 if t is within interval i 
0 if otherwise 


VII. An Application of the Extended Cox Model to an 
Epidemiologic Study on the Treatment of Heroin 
Addiction (pages 230-234) 

A. 1991 Australian study of heroin addicts 

• two methadone maintenance clinics 

• addicts dataset file 

• clinic variable did not satisfy PH assumption 

B. Clinic 2 has consistently higher retention 
probabilities than clinic 1, with a more 
pronounced difference in clinics after one year of 
treatment. 

C. Two extended Cox models were considered: 

• Use heaviside functions to obtain two distinct 
hazard ratios, one for less than one year and the 
other for greater than one year. 

• Use a time-dependent variable that allows for 
the two survival curves to diverge over time. 
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Practice 

Exercises 


VIII. An Application of the Extended Cox Model to the 
Analysis of the Stanford Heart Transplant Data 

(pages 235-239) 

A. The goal of the study was to assess whether 
patients receiving transplants survived longer than 
patients not receiving transplants. 

B. We described an extended Cox model containing 
three time-dependent variables: 

h(t,X(t)) = h 0 (t) exp[ biHT(t) + 6 2 TMS(t) + 6 3 AG£(f)] 

C. The exposure variable, called HT(t), was an 
indicator of transplant status at time t. The other 
two variables, TMS(t) and AGE(t), gave tissue 
mismatch scores and age for transplant patients 
when time t occurred after receiving a transplant. 

D. The results yielded a highly significant effect of the 
exposure variable. 

E. The use of a hazard ratio estimate for this data was 
problematical. 

• An inappropriate formula is the exponential of 
the coefficient of HT{t), which yields 1/23.98. 

• An alternative formula considers the values of 
the covariates TMS(t) and AGE(t ) at time t. 

IX. Extended Cox Likelihood (pages 239-242) 

A. Review of PH likelihood (Chapter 3). 

B. Barry, Gary, Larry, example of Cox likelihood. 

X. Summary (pages 242-245) 


The following dataset called “anderson.dat” consists of remis¬ 
sion survival times on 42 leukemia patients, half of whom 
receive a new therapy and the other half of whom get a stan¬ 
dard therapy (Freireich et ah, Blood, 1963). The exposure vari¬ 
able of interest is treatment status {Rx = 0 if new treatment, 
Rx = 1 if standard treatment). Two other variables for con¬ 
trol are log white blood cell count (i.e., log WBC) and sex. 
Failure status is defined by the relapse variable (0 if censored, 
1 if failure). The dataset is listed as follows: 


Subj 

Surv 

Relapse 

Sex 

log WBC 

Rx 

1 

35 

0 

1 

1.45 

0 

2 

34 

0 

1 

1.47 

0 

3 

32 

0 

1 

2.2 

0 

4 

32 

0 

1 

2.53 

0 

5 

25 

0 

1 

1.78 

0 

6 

23 

1 

1 

2.57 

0 




(Continued on next page) 
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Subj 

Surv 

Relapse 

Sex 

log WBC 

Rx 

7 

22 

1 

1 

2.32 

0 

8 

20 

0 

1 

2.01 

0 

9 

19 

0 

0 

2.05 

0 

10 

17 

0 

0 

2.16 

0 

11 

16 

1 

1 

3.6 

0 

12 

13 

1 

0 

2.88 

0 

13 

11 

0 

0 

2.6 

0 

14 

10 

0 

0 

2.7 

0 

15 

10 

1 

0 

2.96 

0 

16 

9 

0 

0 

2.8 

0 

17 

7 

1 

0 

4.43 

0 

18 

6 

0 

0 

3.2 

0 

19 

6 

1 

0 

2.31 

0 

20 

6 

1 

1 

4.06 

0 

21 

6 

1 

0 

3.28 

0 

22 

23 

1 

1 

1.97 

1 

23 

22 

1 

0 

2.73 

1 

24 

17 

1 

0 

2.95 

1 

25 

15 

1 

0 

2.3 

1 

26 

12 

1 

0 

1.5 

1 

27 

12 

1 

0 

3.06 

1 

28 

11 

1 

0 

3.49 

1 

29 

11 

1 

0 

2.12 

1 

30 

8 

1 

0 

3.52 

1 

31 

8 

1 

0 

3.05 

1 

32 

8 

1 

0 

2.32 

1 

33 

8 

1 

1 

3.26 

1 

34 

5 

1 

1 

3.49 

1 

35 

5 

1 

0 

3.97 

1 

36 

4 

1 

1 

4.36 

1 

37 

4 

1 

1 

2.42 

1 

38 

3 

1 

1 

4.01 

1 

39 

2 

1 

1 

4.91 

1 

40 

2 

1 

1 

4.48 

1 

41 

1 

1 

1 

2.8 

1 

42 

1 

1 

1 

5 

1 


The following edited printout gives computer results for fit¬ 
ting a Cox PH model containing the three predictives Rx, log 
WBC, and Sex. 


Cox regression 
Analysis time_t: survt 

Coef. 

Std. Err. 

P > |z| 

Haz. Ratio 

[95% Conf. 
Interval] 

P(PH) 

Sex 

0.263 

0.449 

0.558 

1.301 

0.539 3.139 

0.042 

log WBC 

1.594 

0.330 

0.000 

4.922 

2.578 9.397 

0.714 

Rx 

1.391 

0.457 

0.002 

4.018 

1.642 9.834 

0.500 


No. of subjecs = 42 Log likelihood = —72.109 
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1. Which of the variables in the model fitted above are time- 
independent and which are time-dependent? 

2. Based on this printout, is the PH assumption satisfied for the 
model being fit? Explain briefly. 

3. Suppose you want to use an extended Cox model to assess 
the PH assumption for all three variables in the above model. 
State the general form of an extended Cox model that will 
allow for this assessment. 

4. Suppose you wish to assess the PH assumption for the Sex 
variable using a heaviside function approach designed to 
yield a constant hazard ratio for less than 15 weeks of follow¬ 
up and a constant hazard ratio for 15 weeks or more of follow¬ 
up. State two equivalent alternative extended Cox models that 
will carry out this approach, one model containing one heav¬ 
iside function and the other model containing two heaviside 
functions. 

5. The following is an edited printout of the results obtained 
by fitting an extended Cox model containing two heaviside 
functions: 


Time-Dependent Cox Regression Analysis 


Analysis 
time_t: survt 

Coef. 

Std. Err. 

P > |z| 

Haz. Ratio 

[95% Conf. 
Interval] 

log WBC 

1.567 

0.333 

0.000 

4.794 

2.498 

9.202 

Rx 

1.341 

0.466 

0.004 

3.822 

1.533 

9.526 

0-15 wks 

0.358 

0.483 

0.459 

1.430 

0.555 

3.682 

15+ wks 

-0.182 

0.992 

0.855 

0.834 

0.119 

5.831 

No. of subjects 

= 42 

Log likelihood = 

-71.980 




Using the above computer results, carry out a test of hypoth¬ 
esis, estimate the hazard ratio, and obtain 95% confidence 
interval for the treatment effect adjusted for log WBC and 
the time-dependent Sex variables. What conclusions do you 
draw about the treatment effect? 

6. We now consider an alternative approach to controlling for 
Sex using an extended Cox model. We define an interaction 
term between sex and time that allows for diverging survival 
curves over time. 

For the situation just described, write down the extended Cox 
model, which contains Rx, log WBC, and Sex as main effects 
plus the product term sex x time. 
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Extension of the Cox Proportional Hazards Model 


7. Using the model described in question 6, express the hazard 
ratio for the effect of Sex adjusted for Rx and log WBC at 8 
and 16 weeks. 

8. The following is an edited printout of computer results 
obtained by fitting the model described in question 6. 

Time-Dependent Cox Regression Analysis 


Analysis 
time_t: survt 

Coef. 

Std. Err. 

P > |z| 

Haz. Ratio 

[95% Conf. 
Interval] 

Sex 

1.820 

1.012 

0.072 

6.174 

0.849 

44.896 

log WBC 

1.464 

0.336 

0.000 

4.322 

2.236 

8.351 

Rx 

1.093 

0.479 

0.022 

2.984 

1.167 

7.626 

Sex x Time 

-0.345 

0.199 

0.083 

0.708 

0.479 

1.046 

No. of subjects 

= 42 

Log likelihood = 

-70.416 




Based on the above results, describe the hazard ratio estimate 
for the treatment effect adjusted for the other variables in the 
model, and summarize the results of the significance test and 
interval estimate for this hazard ratio. How do these results 
compare with the results previously obtained when a heavi- 
side function approach was used? What does this comparison 
suggest about the drawbacks of using an extended Cox model 
to adjust for variables not satisfying the PH assumption? 

9. The following gives an edited printout of computer results 
using a stratified Cox procedure that stratifies on the Sex 
variable but keeps Rx and log WBC in the model. 

Stratified Cox regression 


Analysis 
time_t: survt 

Coef. 

Std. Err. 

P > |z| 

Haz. Ratio 

[95% Conf. 
Interval] 

log WBC 

Rx 

1.390 

0.931 

0.338 

0.472 

0.000 

0.048 

4.016 

2.537 

2.072 7.783 

1.006 6.396 

No. of subjects 

= 42 

Log likelihood = 

—57.560 Stratified by sex 


Compare the results of the above printout with previously 
provided results regarding the hazard ratio for the effect of 
Rx. Is there any way to determine which set of results is more 
appropriate? Explain. 
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Test The following questions consider the analysis of data from a 

clinical trial concerning gastric carcinoma, in which 90 pa¬ 
tients were randomized to either chemotherapy (coded as 2) 
alone or to a combination of chemotherapy and radiation 
(coded as 1). See Stablein et ah, “Analysis of Survival Data 
with Nonproportional Hazard Functions,” Controlled Clini¬ 
cal Trials, vol. 2, pp. 149-159 (1981). A listing of the dataset 
(called chemo) is given at the end of the presentation. 

1. A plot of the log-log Kaplan-Meier curves for each 
treatment group is shown below. Based on this plot, what 
would you conclude about the PH assumption regarding 
the treatment group variable? Explain. 


4.0 

2.0 

0.0 

- 2.0 


\ 

In 2 n 

1-. 2—i 

1 -, 2 —, 

1 -. 2-i 

1 - 1 2 - 

1 - 


Log-log Survival Curves for 
Each Treatment Group 


0 200 

Number at risk 

400 

600 

800 

1000 

1200 

1400 

45 26 

20 

11 

10 

7 

5 

2 

45 40 

25 

17 

10 

7 

6 

2 


2. The following is an edited printout of computer results 
obtained when fitting the PH model containing only the 
treatment group variable. Based on these results, what 
would you conclude about the PH assumption regarding 
the treatment group variable? Explain. 


Cox regression 
Analysis time_t: 




[95% Conf. 


survt 

Coef. 

Std. Err. p > |z[ 

Haz. Ratio 

Interval] 

P{PH) 

Tx 

-0.267 

0.233 0.253 

0.766 

0.485 1.21 

0 


No. of subjects = 90 Log likelihood = —282.744 


3. The following printout shows the results from using a 
heaviside function approach with an extended Cox model 
to fit these data. The model used product terms of the 
treatment variable (Tx) with each of three heaviside func¬ 
tions. The first product term (called Timel) involves a 
heaviside function for the period from 0 to 250 days, 
the second product term (i.e., Time2) involves the period 
from 250 to 500 days, and the third product term (i.e., 
Time3) involves the open-ended period from 500 days and 
beyond. 
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6. Extension of the Cox Proportional Hazards Model 


Time-Dependent Cox Regression Analysis 


Analysis [95% Conf. 


time t: survt 

Coef. 

Std. Err. 

P > |z| 

Haz. Ratio 

Interval] 

Timel 

-1.511 

0.461 

0.001 

0.221 

0.089 

0.545 

Time2 

0.488 

0.450 

0.278 

1.629 

0.675 

3.934 

Time3 

0.365 

0.444 

0.411 

1.441 

0.604 

3.440 

No. of subjects 

= 90 

Log likelihood 

= -275.745 




Write down the hazard function formula for the extended 
Cox model being used, making sure to explicitly define the 
heaviside functions involved. 

4. Based on the printout, describe the hazard ratios in each 
of the three time intervals, evaluate each hazard ratio for 
significance, and draw conclusions about the extent of the 
treatment effect in each of the three time intervals consid¬ 
ered. 

5. Inspection of the printout provided in question 3 indicates 
that the treatment effect in the second and third intervals 
appears quite similar. Consequently, another analysis was 
considered that uses only two intervals, from 0 to 250 days 
versus 250 days and beyond. Write down the hazard func¬ 
tion formula for the extended Cox model that considers 
this situation (i.e., containing two heaviside functions). 
Also, write down an equivalent alternative hazard func¬ 
tion formula which contains the main effect of treatment 
group plus one heaviside function variable. 

6. For the situation described in question 5, the computer 
results are provided below. Based on these results, 
describe the hazard ratios for the treatment effect below 
and above 250 days, summarize the inference results 
for each hazard ratio, and draw conclusions about the 
treatment effect within each time interval. 

Time-Dependent Cox Regression Analysis 


Analysis time t: survt 

Column 

name 

Coeff 

StErr 

p-value 

HR 

0.95 

Cl 

Timel 

-1.511 

0.461 

0.001 

0.221 

0.089 

0.545 

Time2 

0.427 

0.315 

0.176 

1.532 

0.826 

2.842 


No. of subjects = 90 


Log likelihood = —275.764 
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Answers to 

Practice 

Exercises 


1. All three variables in the model are time-independent vari- 
bles. 

2. The computer results indicate that the Sex variables do not 
satisfy the PH assumption because the P(PH) value is 0.042, 
which is significant at the 0.05 level. 

3. h(t,X(t)) — hoCOexptp^sex) + |3 2 (log WBC) + (3 3 (.Rx) 

+ 61 (sex)g i (t) + 6 2 (log WBC)g 2 (f) 
+ 6 3 (flx)g 3 (f)] 

where the g;(f) are functions of time. 


4. Model 1 (one heaviside function) 


h(t,X(t)) — ho(t)exp[& l (se;x) + |3 2 (log WBC) + (3 3 (Rx) 
+ &i(sex)gi(t)] 


where 
gi(t) = 


1 if 0 < t < 15 weeks 
0 ff t > 15 weeks 


Model 2 (two heaviside functions): 


h(t,X(t)) = /r 0 (t)exp[(3 2 (log WBC) + (3 3 (i?x) + 6 i(sex)gi(t) 
+ 6 2 (sex)g 2 (z)] 


where 
gi (0 = 


1 if 0 < t < 15 weeks 
0 if t > 15 weeks 


and 


giit) = 


0 if t > 15 weeks 
1 if 0 < t < 15 weeks 


5. The estimated hazard ratio for the effect of Rx is 3.822; this 
estimate is adjusted for log WBC and for the Sex variable 
considered as two time-dependent variables involving heav¬ 
iside functions. The Wald test for significance of Rx has a 
p-value of 0.004, which is highly significant. The 95% confi¬ 
dence interval for the treatment effect ranges between 1.533 
and 9.526, which is quite wide, indicating considerable un¬ 
reliability of the 3.822 point estimate. Nevertheless, the re¬ 
sults estimate a statistically significant treatment effect of 
around 3.8. 


h(t,X(t)) = /zoWexpfPjfsex) + (3 2 (log WBC) + p 3 (i?x) 
+ 61 (sex x t)] 


6. 





256 6. Extension of the Cox Proportional Hazards Model 

7. The hazard ratio for the effect of Sex in each time interval, 
controlling for Rx and log WBC is given as follows: 

t = 8 weeks HR = expflfy + 85i] 

t = 16 weeks HR = exp[(3! + 166i] 

8 . Using the model containing Sex, log WBC, Rx, and Sex x 
Time, the estimated hazard ratio for the treatment effect is 
given by 2.984, with a p-value of 0.022 and a 95% confi¬ 
dence interval ranging between 1.167 and 7.626. The point 
estimate of 2.984 is quite different from the point estimate 
of 3.822 for the heaviside function model, although the con¬ 
fidence intervals for both models are wide enough to in¬ 
clude both estimates. The discrepancy between point esti¬ 
mates demonstrates that when a time-dependent variable 
approach is to be used to account for a variable not satis¬ 
fying the PH assumption, different results may be obtained 
from different choices of time-dependent variables. 

9. The stratified Cox analysis yields a hazard ratio of 2.5 3 7 with 
a p-value of 0.048 and a 95% Cl ranging between 1.006 and 
6.396. The point estimate is much closer to the 2.984 for 
the model containing the Sex x Time product term than to 
the 3.822 for the model containing two heaviside functions. 
One way to choose between models would be to compare 
goodness-of-fit test statistics for each model; another way is 
to compare graphs of the adjusted survival curves for each 
model and determine by eye which set of survival curves fits 
the data better. 


Parametric 

Survival 

Models 


257 
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Introduction 


Abbreviated 

Outline 


The Cox model is the most widely used survival model in the 
health sciences, but it is not the only model available. In this 
chapter we present a class of survival models, called paramet¬ 
ric models, in which the distribution of the outcome (i.e., the 
time to event) is specified in terms of unknown parameters. 
Many parametric models are acceleration failure time models 
in which survival time is modeled as a function of predictor 
variables. We examine the assumptions that underlie acceler¬ 
ated failure time models and compare the acceleration factor 
as an alternative measure of association to the hazard ratio. 
We present examples of the exponential, Weibull, and log- 
logistic models and give a brief description of other paramet¬ 
ric approaches. The parametric likelihood is constructed and 
described in relation to left, right, and interval-censored data. 
Binary regression is presented as an alternative approach for 
modeling interval-censored outcomes. The chapter concludes 
with a discussion of frailty models. 


The outline below gives the user a preview of the material 
covered by the presentation. A detailed outline for review pur¬ 
poses follows the presentation. 

I. Overview (pages 260-262) 

II. Probability Density Function in Relation to the 
Hazard and Survival Function (pages 262-263) 

III. Exponential Example (pages 263-265) 

IV. Accelerated Failure Time Assumption 
(pages 266-268) 

V. Exponential Example Revisited (pages 268-272) 

VI. Weibull Example (pages 272-277) 

VII. Log-Logistic Example (pages 277-282) 

VIII. A More General Form of the AFT Model 
(pages 282-284) 

IX. Other Parametric Models (pages 284-286) 

X. The Parametric Likelihood (pages 286-289) 

XI. Interval-Censored Data (pages 289-294) 

XII. Frailty Models (pages 294-308) 

XIII. Summary (pages 309-312) 
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Objectives 


Upon completing this chapter, the learner should be able to: 

1. State or recognize the form of a parametric survival 
model and contrast it with a Cox model. 

2. State common distributions used for parametric survival 
models. 

3. Contrast an AFT model with a PH model. 

4. Interpret output from an exponential survival model. 

5. Interpret output from a Weibull survival model. 

6 . Interpret output from a log-logistic survival model. 

7. State or recognize the formulation of a parametric like¬ 
lihood. 

8 . State or recognize right-censored, left-censored, and 
interval-censored data. 

9. State or recognize the form of a frailty model and the 
purpose of including a frailty component. 

10. Interpret the output obtained from a frailty model. 
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Presentation 


I. Overview 


parametric models 
exponential example 
AFT vs. PH 
Weibull example 

Focus log-logistic example 

other approaches 
parametric likelihood 
interval-censoring 
frailty models 


Parametric Modeling 

• Outcome assumed to follow 
some family of distributions 

• Exact distribution is unknown 
if parameters are unknown 

• Data used to estimate 
parameters 

• Examples of parametric models: 
o Linear regression 

o Logistic regression 
o Poisson regression 


Distributions commonly used for 
parametric survival models: 

• Weibull 

• Exponential 

• Log-logistic 

• Lognormal 

• Generalized gamma 


In this chapter we present parametric survival 
models and the assumptions that underlie these 
models. Specifically we examine the accelerated 
failure time (AFT) assumption and contrast it 
to the proportional hazards (PH) assumption. 
We present examples of several parametric mod¬ 
els, including the exponential model, the Weibull 
model, and the log-logistic model. The paramet¬ 
ric likelihood is discussed and how it accommo¬ 
dates left-, right-, and interval-censored data. We 
also consider models that include a frailty com¬ 
ponent to account for unobserved heterogeneity. 

Linear regression, logistic regression, and Poisson 
regression are examples of parametric models that 
are commonly used in the health sciences. With 
these models, the outcome is assumed to follow 
some distribution such as the normal, binomial, 
or Poisson distribution. Typically, what is actually 
meant is that the outcome follows some family of 
distributions of similar form with unknown pa¬ 
rameters. It is only when the value of the parame¬ 
ters) is known that the exact distribution is fully 
specified. For example, if one distribution is nor¬ 
mal with a mean of three and another distribution 
is normal with a mean of seven, the distributions 
are of the same family (i.e., normal) but they are 
not exactly the same distribution. For parametric 
regression models, the data are typically used to 
estimate the values of the parameters that fully 
specify that distribution. 

A parametric survival model is one in which 
survival time (the outcome) is assumed to fol¬ 
low a known distribution. Examples of distribu¬ 
tions that are commonly used for survival time 
are: the Weibull, the exponential (a special case 
of the Weibull), the log-logistic, the lognormal, 
and the generalized gamma, all of which are sup¬ 
ported by SAS or Stata software. 
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Parametric survival models 

\ 

Distribution specified for time 
Cox model is semiparametric: 

\ 

Baseline survival not specified 


The Cox proportional hazards model, by contrast, 
is not a fully parametric model. Rather it is a semi¬ 
parametric model because even if the regression 
parameters (the betas) are known, the distribution 
of the outcome remains unknown. The baseline 
survival (or hazard) function is not specified in a 
Cox model. 


Cox model widely popular: 

• No reliance on assumed 
distribution 

• Computer packages can output 
Cox-adjusted survival estimates 
using algorithm that generalizes 
KM 

• Baseline not necessary for 
estimation of hazard ratio 


A key reason why the Cox model is widely pop¬ 
ular is that it does not rely on distributional as¬ 
sumptions for the outcome. Although the base¬ 
line survival function is not estimated with a Cox 
model, computer packages such as SAS, Stata, and 
SPSS can output Cox-adjusted survival estimates 
(see Computer Appendix) by using a compli¬ 
cated algorithm that generalizes the Kaplan- 
Meier (KM) approach while making use of esti¬ 
mated regression coefficients obtained from a Cox 
model (Kalbfleisch and Prentice, 1980). Also, an es¬ 
timation of the baseline hazard is not necessary 
for the estimation of a hazard ratio because the 
baseline hazard cancels in the calculation. 



S(z) 

1.0 - 


In theory, as time ranges from 0 to infin¬ 
ity, the survival function can be graphed as a 
smooth curve from S(0) = 1 to S(oo) = 0 (see 
Chapter 1). Kaplan-Meier and Cox-adjusted sur¬ 
vival estimates use empirical nondistributional 
methods that typically graph as step functions, 
particularly if the sample size is small. If in the 
data, for example, an event occurred at 3 weeks 
and the next event occurred at 7 weeks, then the es¬ 
timated survival curve would be flat between 3 and 
7 weeks using these nondistributional approaches. 
Moreover, if the study ends with subjects still re¬ 
maining at risk, then the estimated survival func¬ 
tion would not go all the way down to zero. 


-1-1-i 

0 3 7 

Step function (nondistributional 
estimates) 
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Appeal of Parametric Survival 
Models 

• More consistent with theoretical 
S(t) than nondistributional 
approaches 

• Simplicity 

• Completeness—h(t) and S(t) 
specified 


Survival estimates obtained from parametric sur¬ 
vival models typically yield plots more consistent 
with a theoretical survival curve. If the investi¬ 
gator is comfortable with the underlying distri¬ 
butional assumption, then parameters can be es¬ 
timated that completely specify the survival and 
hazard functions. This simplicity and complete¬ 
ness are the main appeals of using a parametric 
approach. 


II. Probability Density 
Function in Relation 
to the Hazard and 
Survival Function 


Probability function known 
then 

Survival and hazard can be found 


S(t) = P(T > t) = 



-d[S(t)]/dt 


For parametric survival models, time is assumed 
to follow some distribution whose probability den¬ 
sity function f(t) can be expressed in terms of 
unknown parameters. Once a probability density 
function is specified for survival time, the corre¬ 
sponding survival and hazard functions can be de¬ 
termined. The survival function S(t) = P(T > t) 
can be ascertained from the probability density 
function by integrating over the probability den¬ 
sity function from time t to infinity. The hazard can 
then be found by dividing the negative derivative 
of the survival function by the survival function 
(see left). 


Survival in terms of hazard 
Sit) = exp I — J h(u)du 


Cumulative hazard: 


J h{u)du 

o 


f(t) = h(t)S(t ) 


The survival function can also be expressed in 
terms of the hazard function (see Chapter 1) by ex¬ 
ponentiating the negative of the cumulative haz¬ 
ard function. The cumulative hazard function is 
the integral of the hazard function between inte¬ 
gration limits of 0 and t. 


Finally, the probability function can be expressed 
as the product of the hazard and the survival func¬ 
tions, fit) = hit)Sit). 


Key Point 

Specifying one of f(t), S(t), or h(t) 
specifies all three functions 


The key point is that specifying any one of the 

probability density function, survival function, 
or hazard function allows the other two func¬ 
tions to be ascertained by using the formulas 
shown on the left. 
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Survival and Hazard Functions On the left is a table containing the survival and 

for Selected Distributions hazard functions for three of the more commonly 

- used distributions for survival models: the expo- 

Distribution S(t ) h(t) nential, Weibull, and log-logistic distributions. 


Exponential exp(-At) 
Weibull exp(— \t p ) 

, 1 

Log-logistic ---— 

66 1 + A tP 


A 

Apt p ~' 

A pt p ~ l 
1 + A V 3 


f(t) = h(t)S(t) 

For example, Weibull: 
f{t) = A pt p ~ l exp(—A t p ) 
because h(t) = A pt p ~ l and 
S(t) = exp(—At p ) 


The exponential is a one-parameter distribution 
with a constant hazard A. The Weibull and log- 
logistic distributions have two parameters A and;;. 
Notice that the Weibull distribution reduces to the 
exponential if p = 1. The probability density func¬ 
tion for these distributions can be found by mul¬ 
tiplying h(t) and S(t). As an example, the Weibull 
probability density function is shown on the left. 


Typically in parametric models: 

• A reparameterized for 
regression 

• p held fixed 


Typically for parametric survival models, the pa¬ 
rameter A is reparameterized in terms of predic¬ 
tor variables and regression parameters and the 
parameter p (sometimes called the shape parame¬ 
ter) is held fixed. This is illustrated in the examples 
to come. 


III. Exponential Example 


Simplest parametric survival model: The first example we consider is the exponential 
Hazard function: h(t) = A model, which is the simplest parametric survival 

(where A is a constant) model in that the hazard is constant over time (i.e., 

h(t) = A). The model is applied to the remission 
data (Freireich et al., 1963), in which 42 leukemia 
patients were followed until remission or censor¬ 
ship. Twenty-one patients received an experimen¬ 
tal treatment (coded TRT = 1) and the other 21 
received a placebo (coded TRT = 0). The data 
are listed in Chapter 1. The variable TRT is just 
a reverse coding of the variable RX presented in 
Chapter 3. 


EXAMPLE 


Remission data (n = 42) 

21 patients given treatment (TRT = 1) 
21 patients given placebo (TRT = 0) 
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h(t) = A = exp(|3 0 + (SjTRT) 

TRT = 1: h(t) = exp((3 0 + |3j) 
TRT = 0: h(t) = exp((3 0 ) 


HR(TRT = 1 vs. TRT = 0) 
exp(|3 0 + Pj) 


exp((3 0 ) 


= expCPj) 


For simplicity, we demonstrate an exponential 
model that has TRT as the only predictor. We 
state the model in terms of the hazard by repa¬ 
rameterizing A as exp((3 0 + PiTRT). With this 
model, the hazard for subjects in the treated group 
is exp(|3 0 + (3 1 ) and the hazard for the placebo 
group is exp((3 0 ). The hazard ratio comparing the 
treatment and placebo (see left side) is the ratio of 
the hazards exp(|3[). The exponential model is a 
proportional hazards model. 


Constant Hazards 

=> Proportional Hazards 

Proportional Hazards 

Constant Hazards 

Exponential Model—Hazards are 
constant 

Cox PH Model—Hazards are pro¬ 
portional not necessarily constant 


The assumption that the hazard is constant for 
each pattern of covariates is a much stronger as¬ 
sumption than the PH assumption. If the hazards 
are constant, then of course the ratio of the haz¬ 
ards is constant. However, the hazard ratio being 
constant does not necessarily mean that each 
hazard is constant. In a Cox PH model the base¬ 
line hazard is not assumed constant. In fact, the 
form of the baseline hazard is not even specified. 


Remission Data 

Exponential regression 
log relative-hazard form 

t Coef. Std. Err. z p >|z| 

trt -1.527 .398 -3.83 0.00 

_cons -2.159 .218 -9.90 0.00 


Output from running the exponential model is 
shown on the left. The model was run using Stata 
software (version 7.0). The parameter estimates 
are listed under the column called Coef. The pa¬ 
rameter estimate for the coefficient of TRT (pj) is 
— 1.527. The estimate of the intercept (called _cons 
in the output) is —2.159. The standard errors (Std. 
Err.), Wald test statistics (z), and p-values for the 
Wald test are also provided. The output indicates 
that the z test statistic for TRT is statistically sig¬ 
nificant with a p-value <0.005 (rounds to 0.00 in 
the output). 


Coefficient estimates obtained by MLE 


\ 


asymptotically normal 


The regression coefficients are estimated using 
maximum likelihood estimation (MLE), and 
are asymptotically normally distributed. 
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TRT = l:/z(t) = exp(—2.159 
+ (-1.527)) = 0.025 
TRT = 0:h(t) = exp(—2.159) 

= 0.115 

HR (TRT = 1 vs. 0) = exp(-1.527) 
= 0.22 

95% Cl =exp[-1.527 + 1.96(0.398)] 
= (0.10,0.47) 


The estimated hazards for TRT = 1 and TRT = 
0 are shown on the left. The estimated hazard 
ratio of 0.22 is obtained by exponentiating the 
estimated coefficient (—1.527) of the TRT vari¬ 
able. A 95% confidence interval can be calculated 
exp[-1.527 ± 1.96(0.398)] yielding a Cl of (O.fO, 
0.47). These results suggest that the experimental 
treatment delays remission. 


Results: suggest treatment lowers 
hazard 


Parametric models 

• Need not be PH models 

• Many are AFT models 

Exponential and Weibull 

• Accommodate PH and AFT 
assumptions 


Up to this point in the book, the key assump¬ 
tion for survival models has been the proportional 
hazard assumption. However, parametric survival 
models need not be PH models. Many paramet¬ 
ric models are acceleration failure time mod¬ 
els rather than PH models. The exponential and 
Weibull distributions can accommodate both the 
PH and AFT assumptions. 


Remission Data 

Exponential regression 
accelerated failure-time form 


_t Coef. Std. Err. z p >|z| 


trt 1.527 .398 3.83 0.00 

_cons 2.159 .218 9.90 0.00 


On the left is Stata output from the AFT form of 
the exponential model with TRT as the only pre¬ 
dictor. Stata can output both the PH or AFT form 
of an exponential or Weibull model (see Computer 
Appendix). SAS (version 8.2) only runs the AFT 
form of parametric models and SPSS (version 
11.5) does not yet provide commands to run para¬ 
metric models. 


AFT vs. PH 

• Different interpretation of 
parameters 

• AFT applies to comparison of 
survival times 

• PH applies to comparison of 
hazards 


The interpretation of parameters differs for AFT 
and PH models. The AFT assumption is applicable 
for a comparison of survival times whereas the 
PH assumption is applicable for a comparison of 
hazards. In the following sections we discuss the 
AFT assumption and then revisit this example and 
discuss the AFT form of this model. 
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IV. Accelerated Failure Time 
Assumption 

AFT—Multiplicative effect with 
survival time 

PH —Multiplicative effect with 
hazard 


The underlying assumption for AFT models is that 
the effect of covariates is multiplicative (propor¬ 
tional) with respect to survival time, whereas 
for PH models the underlying assumption is that 
the effect of covariates is multiplicative with re¬ 
spect to the hazard. 



Survival Function Survival Function 

For Dogs For Humans 


AFT models: 

Describe “stretching out” or 
contraction of survival time 


To illustrate the idea underlying the AFT assump¬ 
tion, consider the lifespan of dogs. It is often said 
that dogs grow older seven times faster than hu¬ 
mans. So a 10-year-old dog is in some way equiv¬ 
alent to a 70-year-old human. In AFT terminology 
we might say the probability of a dog surviving 
past 10 years equals the probability of a human 
surviving past 70 years. Similarly, we might say the 
probability of a dog surviving past 6 years equals 
the probability of a human surviving past 42 years 
because 42 equals 6 times 7. More generally we can 
say So(t) = Sn(7t), where Soft) and Sn(t) are the 
survival functions for dogs and humans, respec¬ 
tively. In this framework dogs can be viewed, on 
average, as accelerating through life 7 times faster 
than humans. Or from the other perspective, the 
lifespan of humans, on average, is stretched out 7 
times longer than the lifespan of dogs. AFT mod¬ 
els describe this “stretching out” or contrac¬ 
tion of survival time as a function of predictor 
variables. 


Second Illustration 

51 (t)—Survival function for smokers 

5 2 (t)—Survival function for 

nonsmokers 


AFT assumption: 

S 2 (t) = Si(yt) for t > 0 
y is the acceleration factor 

If y = exp(|3) 

S 2 (t) = Si([exp(a)]t) 
or 

S 2 ([exp(—a)]t) = Si(t) 


For a second illustration of the accelerated fail¬ 
ure time assumption consider a comparison of 
survival functions among smokers Si(t) and non- 
smokers S 2 (t). The AFT assumption can be ex¬ 
pressed as S 2 (t) = Si(Yt) for t > 0, where y is a 
constant called the acceleration factor compar¬ 
ing smokers to nonsmokers. In a regression frame¬ 
work the acceleration factor y could be parame¬ 
terized as exp(a) where a is a parameter to be 
estimated from the data. With this param¬ 
eterization, the AFT assumption can be ex¬ 
pressed as S 2 (t) = Si(exp(a)t) or equivalently: 
S 2 (exp(—<x)t) = Si(t) for t > 0. 
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Suppose exp(a) = 0.75 
then 

S 2 (80) = Si(60) 
S 2 (40) = Si(30) 


More generally 
S 2 (t) = Si(0.75t) 


Suppose exp(a) = 0.75; then the probability of a 
nonsmoker surviving 80 years equals the proba¬ 
bility of a smoker surviving 80(0.75) or 60 years. 
Similarly, the probability of a nonsmoker surviv¬ 
ing 40 years equals the probability of a smoker 
surviving 30 years. More generally, the probability 
of a nonsmoker surviving t years equals the proba¬ 
bility of a smoker surviving 0.75 times t years (i.e., 
S 2 (t) = S!(0.75t)). 


Ti—Survival time for smokers 
T 2 —Survival time for nonsmokers 

AFT assumption in terms of random 
variables: 

Ti = yT 2 


The AFT assumption can also be expressed in 
terms of random variables for survival time rather 
than the survival function. If T 2 is a random vari¬ 
able (following some distribution) representing 
the survival time for nonsmokers and Ti is a ran¬ 
dom variable representing the survival time for 
smokers, then the AFT assumption can be ex¬ 
pressed as Ti = y T 2 . 


Acceleration factor 
Measure of association 
on survival time 

Hazard ratio 

Measure of association on the 
hazard 


The acceleration factor is the key measure of as¬ 
sociation obtained in an AFT model. It allows the 
investigator to evaluate the effect of predictor vari¬ 
ables on survival time just as the hazard ratio al¬ 
lows the evaluation of predictor variables on the 
hazard. 


Acceleration factor (y) 

• Describes stretching or 
contraction of S(t) 

• Ratio of times to any fixed 
value of S(t) 

Suppose y = 2.0 

(Group 2 vs. Group 1) 

• Time to S(t) = 0.50 (median) is 
double for Group 2 

• Time to S(t) = 0.20 is double for 
Group 2 

• Time to S(t) = 0.83 is double for 
Group 2 

• Time to S(t) = 0.98 is double for 
Group 2 

• Time to S(t) = q is double for 
Group 2 (generalization) 


The acceleration factor describes the “stretching 
out” or contraction of survival functions when 
comparing one group to another. More precisely, 
the acceleration factor is a ratio of survival 
times corresponding to any fixed value of S(t). 
For example, if the acceleration factor comparing 
subjects in Group 2 vs. Group 1 is y = 2.0, then 
the median survival time (value of t when S(t) = 
0.5) for Group 2 is double the median survival time 
for Group 1. Moreover, the time it takes for S(t) to 
equal 0.2 or 0.83 or 0.98 is double for Group 2 
compared to Group 1 for the same value of S(t). 
In general, the acceleration factor is a ratio of sur¬ 
vival times corresponding to any quantile of sur¬ 
vival time (S(t) = q). 
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Survival curves for Group 1 (G = 1) 
and Group 2 (G = 2) 


This idea is graphically illustrated by examining 
the survival curves for Group 1 (G = 1) and Group 
2 (G = 2) shown on the left. For any fixed value of 
S(t), the distance of the horizontal line from the 
S(t) axis to the survival curve for G = 2 is double 
the distance to the survival curve for G = 1. No¬ 
tice the median survival time (as well as the 25th 
and 75th percentiles) is double for G = 2. For AFT 
models, this ratio of survival times is assumed con¬ 
stant for all fixed values of S(t). 


Horizontal lines are twice as long to 
G = 2 compared to G = 1 because 
7 = 2 


V. Exponential Example 
Revisited 


Remission data (n = 42) 

21 patients given treatment (TRT = 1) 
21 patients given placebo (TRT = 0) 


Previously discussed PH form of 
model 

Now discuss AFT form of model 


We return to the exponential example applied to 
the remission data with treatment status (TRT) as 
the only predictor. In Section III, results from the 
PH form of the exponential model were discussed. 
In this section we discuss the AFT form of the 
model. 


Exponential survival and hazard 
functions: 

S(t) = exp(—At) 
h(t) = A 

Recall for PH model: 
h(t) = A = exp([3 0 + (3! TRT) 


The exponential survival and hazard functions are 
shown on the left. Recall that the exponential haz¬ 
ard is constant and can be reparameterized as a 
PH model, h(t) = A = exp([3 0 + |3jTRT). In this 
section we show how S(t) can be reparameterized 
as an AFT model. 
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AFT assumption 
(comparing 2 levels) 

• Ratio of times is constant to all 
fixed S(t) 

Strategy for developing the model: 


The underlying AFT assumption, for comparing 
two levels of covariates, is that the ratio of times 
to any fixed value of S(t) = q is constant for any 
probability q. We develop the model with the sur¬ 
vival function and solve for t in terms of S(t). We 
then scale t in terms of the predictors. 


• Solve for t in terms of S(t) 

• Scale t in terms of the predictors 


S(t) = exp(-At) 

t = [—ln(S(t)] x | 

A 

let y = exp(ao + ctiTRT) 

A 

t = [—ln(S(t)] x exp(ao + aiTRT) 




Scaling of 1 
Median survival time, S(t) = 0.5: 


t m = [—ln(0.5)] x expfocg + ajTRT) 


The exponential survival function is S(t) = 
exp(—At). By solving for t, we can obtain a for¬ 
mula for t in terms of S(t). Taking the natural 
log, multiplying by negative 1, and then multi¬ 
plying by the reciprocal of A, yields the expres¬ 
sion for t shown on the left. By reparameteriz¬ 
ing 1/A = exp(a 0 + aiTRT), or equivalently A = 
exp[—(ao + aiTRT)], it can be seen how the pre¬ 
dictor variable TRT is used to scale the time to 
any fixed value of S(t) (see left). For example, to 
find an expression for the median survival time t m , 
substitute S(t) = 0.5 (see left). 


Let S(t) = q 

t = [-ln(q)] x exp(«o + op TRT) 

Acceleration Factor: 
y (TRT = 1 vs. TRT = 0) 

[—ln(<?)]exp(cxo + op) 

y =- 

[—ln(<?)]exp(cxo) 

= exp(ai) 


The expression for t is restated on the left in terms 
of any fixed probability S(t) = q. The acceleration 
factor y is found by taking the ratio of the times to 
S(t) = q for TRT = 1 and TRT = 0. After canceling, 
y reduces to exp(ai). 
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Remission Data 

Exponential regression accelerated 
failure-time form 


_t Coef. Std. Err. z p>|z| 

trt 1.527 .398 3.83 0.00 

cons 2.159 .218 9.90 0.00 

y = exp(1.527) = 4.60 
95% Cl: exp[l .527 ± 1.96(0.398)] 
= (2.11,10.05) 


t = [-ln(q)] x exp(cxo + oqTRT) 
t = [-ln(q)] 

xexp(2.159+ 1.527(TRT)) 

Estimated Survival Times by S(t) 
Quartiles for TRT = 1 and 
TRT = 0 (Exponential Model) 


S(t) = q 

O 

II 

s 

<4r 

1trt=i 

0.25 

12.0 

55.3 

0.50 

6.0 

27.6 

0.75 

2.5 

11.5 


y = 4.60 (for TRT = 1 vs. TRT = 0) 
Ratio of survival times: 


55.3 _ 27.6 
ET0 ~ ~6df 


11.5 


= 4.60 


Effect of treatment: 


• Stretches survival by a factor of 
4.6 

• Interpretation of y has intuitive 
appeal 


On the left is Stata output from the AFT form 
of the exponential model with TRT as the only 
predictor. The estimate of the coefficient for TRT 
is 1.527 with a standard error of 0.398. An esti¬ 
mate of the acceleration factor for treatment is 
y = exp(1.527) = 4.60. A 95% confidence inter¬ 
val for y is calculated as exp[1.527 ± 1.96(0.398)] 
yielding a Cl of (2.11, 10.05). 


The parameter estimates can be used to estimate 
the time t to any value of S(t) = q. The table 
on the left lists the estimated time (in weeks) for 
the first, second (median), and third quartiles of 
S(t) using the expression for f shown above for 
both the treated and placebo groups. In this ex¬ 
ample survival time is the time to remission for 
leukemia patients. 


The ratio of survival times for each row in the ta¬ 
ble comparing TRT = 1 vs. TRT = 0 is 4.60, which 
not coincidently is the estimate of the acceleration 
factor (see left). The estimated acceleration factor 
suggests that the experimental treatment is effec¬ 
tive for delaying remission by stretching survival 
time by a factor of 4.60. Although the hazard ra¬ 
tio is a more familiar measure of association for 
health scientists, the acceleration factor has an in¬ 
tuitive appeal, particularly for describing the effi¬ 
cacy of a treatment on survival. 
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HR and y are reciprocals in expo¬ 
nential models: 

HR (TRT = 1 vs. 0) = exp(-1.527) 
= 0.22 

y(TRT = 1 vs. 0) = exp(1.527) 

= 4.60 

In general 

y > 1 exposure benefits 
survival 

HR > 1 => exposure harmful to 
survival 


Recall from Section III that the hazard ratio 
for the effect of treatment was estimated at 
exp(—1.527) = 0.22 using the PH form of the ex¬ 
ponential model. This result illustrates a key prop¬ 
erty of the exponential model: the corresponding 
acceleration factor and hazards ratio (e.g., TRT = 
1 vs. TRT = 0) are reciprocals of each other. This 
property is unique to the exponential model. What 
can be generalized, however, is that an accelera¬ 
tion factor greater than one for the effect of an 
exposure implies that the exposure is benefi¬ 
cial to survival whereas a hazard ratio greater 
than one implies the exposure is harmful to 
survival (and vice versa). 


y < 1 =>• exposure harmful to 
survival 

HR < 1 =>• exposure benefits 
survival 


y = HR = 1 no effect from 
exposure 


Exponential PH and AFT models: 

• Same model 

• Different parameterization 

• Same estimates for 

o Survival function 
o Hazard function 
o Median survival 


Although the exponential PH and AFT models fo¬ 
cus on different underlying assumptions, they are 
in fact the same model. The only difference is 
in their parameterization. The resulting estimates 
for the survival function, hazard function, and me¬ 
dian survival do not differ between these models 
(see Practice Exercises 6 and 7). 
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For those experienced with Poisson 
regression: 

Exponential and Poisson models 

• Assume a constant rate 

• Different data structure 

o Poisson—aggregate counts 
o Exponential—individual 
level 

• Use different outcomes 

o Poisson—number of cases 
o Exponential—time to 
event 

• Yield equivalent parameter 
estimates 

o With same data and same 
covariates in the model 


For those who have experience with Poisson re¬ 
gression, there is a close connection between the 
exponential and Poisson models. Both distribu¬ 
tions assume an underlying constant rate. In fact, 
if the data are structured such that all the cases 
and the total time at risk are aggregated for each 
pattern of covariates (e.g., TRT = 1 and TRT = 0) 
and the log of the corresponding person-time at 
risk is used as an offset, then a Poisson model will 
yield equivalent parameter estimates as the expo¬ 
nential PH model. The difference is that the ran¬ 
dom outcome for the Poisson model is the count 
of events over a fixed amount of time at risk and 
the random outcome for the exponential model is 
the time (at risk) to event. 


Exponential model is special case We continue with the remission data example and 
of Weibull model present the more general Weibull model, which 

includes the exponential model as a special case. 
In the next section we also show a graphical ap¬ 
proach for evaluating the appropriateness of the 
Weibull (and thus also the exponential) model. 


VI. Weibull Example 

Weibull Model: 

Hazard function: h(t) = A pt p ^ i 
(where p > 0 and A > 0) 

p is a shape parameter 

• p > 1 hazard increases over 
time 

• p = 1 constant hazard 
(exponential model) 

• p < 1 hazard decreases over 
time 


The Weibull model is the most widely used 
parametric survival model. Its hazard function is 
h(t) = ~kpt p ~ l , where p and A > 0. As with the ex¬ 
ponential model, A will be reparameterized with 
regression coefficients. The additional parameter 
p is called a shape parameter and determines the 
shape of the hazard function. If p > 1 then the 
hazard increases as time increases. If p = 1 then 
the hazard is constant and the Weibull model re¬ 
duces to the exponential model (h(t) = A). If p < 1 
then the hazard decreases over time. The addition 
of this shape parameter gives the Weibull model 
greater flexibility than the exponential model yet 
the hazard function remains relatively simple (ba¬ 
sically a scaling of t raised to some fixed power). 


Additional shape parameter offers 
greater flexibility 



Presentation: VI. Weibull Example 273 


Unique property for Weibull model 
AFT => PH and PH =>■ AFT 
Holds if p is fixed 

HR vs. AFT 

Hazard ratio => Comparison of rates 

Acceleration factor =$■ Effect on 
survival 


Useful Weibull property: 

• ln[—In S(t)] is linear with ln(t) 

• Enables graphical evaluation 
using KM survival estimates 

Linearity of In(t) 

S(t) = exp(—At p ) 

=y ln[-ln S(t)] = ln(A) + p ln(t) 

/* t 

Intercept = ln(A), Slope = p 


The Weibull model has the property that if the 
AFT assumption holds then the PH assump¬ 
tion also holds (and vice versa). This property 
is unique to the Weibull model (Cox and Oakes, 
1984) and holds if p does not vary over different 
levels of covariates. The PH assumption allows for 
the estimation of a hazard ratio enabling a com¬ 
parison of rates among different populations. The 
AFT assumption allows for the estimation of an 
acceleration factor, which can describe the direct 
effect of an exposure on survival time. 

The Weibull model also has another key property: 

the log(—log) of S(t) is linear with the log of 
time. This allows a graphical evaluation of the ap¬ 
propriateness of a Weibull model by plotting the 
log negative log of the Kaplan-Meier survival 
estimates against the log of time. 

To see this linear relationship: start with the 
Weibull survival function S(t) = exp(— M p ), take 
the log of S(t), multiply by negative one, and take 
the log again (see left). For the Weibull distribu¬ 
tion, the ln[-ln(S(t))] is a linear function of ln(t) 
with slope p and intercept p ln(A). If the slope 
equals one then t follows an exponential distribu¬ 
tion. 


Remission data: evaluate Weibull as¬ 
sumption for TRT = 1 and TRT = 0 

ln[-ln S(t)] plotted against ln(t) 



Log of time 


We again return to the remission data and evalu¬ 
ate the appropriateness of the Weibull assumption 
for the treated (TRT = 1) and placebo (TRT = 0) 
groups. On the left is the plot of the log nega¬ 
tive log Kaplan-Meier survival estimates against 
the log of time for TRT = 1 and TRT = 0. Both 
plots look reasonably straight suggesting that the 
Weibull assumption is reasonable. Furthermore, 
the lines appear to have the same slope (i.e., are 
parallel, same p) suggesting that the PH (and 
thus the AFT) assumptions hold. If this common 
slope equals one (i.e., p = 1), then survival time 
follows an exponential distribution. The Weibull 
model output containing the parameter estimates 
includes a statistical test for the hypothesis p = 1 
or equivalently for ln(p) = 0 (for testing the expo¬ 
nential assumption). This is examined later in this 
section. 
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Summary of possible results for plot 

of ln[-ln S(t)] against ln(t) 

1. Parallel straight lines =£• Weibull, 
PH, and AFT assumptions hold 

2. Parallel straight lines with slope 
of 1 => Exponential. PH and AFT 

3. Parallel but not straight lines =>• 
PH but not Weibull, not AFT (can 
use Cox model) 

4. Not parallel and not straight =>• 
Not Weibull, PH violated 

5. Not parallel but straight lines =>• 
Weibull holds, but PH and AFT 
violated, different p 


Previous plot suggests Weibull and 
PH assumption reasonable for TRT 


Weibull PH model: 
h(t) = A pt p ~ l 

where A = exp(|3 0 + (3[TRT). 

Hazard ratio (TRT = 1 vs. TRT = 0) 
= exp(|3 0 + fiQptP- 1 
exp(|3 0 )ptP~ l 
= exp(Pj) 


On the left is a summary of five possible re¬ 
sults from an examination of the log negative log 
Kaplan-Meier survival estimates plotted against 
the log of time for two or more levels of covariates. 
The key points are that straight lines support the 
Weibull assumption and parallel curves sup¬ 
port the PH assumption. If the plots are parallel 
but not straight then the PH assumption holds but 
not the Weibull. Assessing whether the curves are 
parallel is a familiar approach for evaluating the 
PH assumption in a Cox model (see Chapter 4 and 
Computer Appendix). An interesting scenario oc¬ 
curs if the lines are straight but not parallel. In this 
situation the Weibull assumption is supported but 
the PH and AFT assumptions are violated. If the 
lines are not parallel, then p is not constant across 
levels of covariates. In Section IX of this chapter, 
we present a method for modeling the shape pa¬ 
rameter p as a function of predictor variables, but 
typically p is assumed fixed. 

An examination of the plot on the previous page 
suggests that the Weibull and PH assumptions are 
reasonable for treatment (TRT). First the PH form 
of the model is presented and then the AFT form. 

The Weibull hazard function is h(t) = A pt p ~ l . A 
Weibull PH model is defined by reparameterizing 
lambda A as exp([3 0 + (3,TRT). The hazard ratio 
is obtained by substituting TRT = 1 and TRT = 0 
into the hazard functions (see left). After canceling 
we obtain the familiar result exp((3!). Note that 
this result depends on p having the same value for 
TRT = 1 and TRT = 0, otherwise time (t) would 
not cancel in the expression for the HR (i.e., PH 
assumption not satisfied). 
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Remission Data 

Weibull regression log relative- 
hazard form 


_t 

Coef. 

Std. Err. 

z p>|z| 

trt 

-1.731 

.413 

-4.19 0.000 

_cons 

-3.071 

.558 

-5.50 0.000 

/ln_p 

.312 

.147 

2.12 0.034 

P 

1.366 

.201 


1/p 

.732 

.109 



On the left is output (Stata version 7.0) from run¬ 
ning the PH form of the Weibull model. There 
are parameter estimates for the coefficient of TRT, 
the intercept (called _cons), and for three forms of 
the shape parameter: p, 1/p, and log(p). The es¬ 
timate forp is 1.366 suggesting an increase in the 
hazard as survival time increases (because p > 1). 
A statistical test for Ho: log(p) = 0 yields a p-value 
of 0.034. At a significance level of 0.05 we would 
reject the null and decide p is not equal to 1, sug¬ 
gesting that the exponential model is not appro¬ 
priate. 


Weibull PH 

HR (TRT = 1 vs. 0) = exp(-1.731) 
= 0.18 

95% Cl = exp[-1.731 ±1.96(0.413)] 
= (0.08, 0.40) 

Weibull: HR = 0.18 
Exponential: HR = 0.22 
Suggests preventive effect of TRT 


An estimated hazard ratio of 0.18 is obtained by 
exponentiating the estimated coefficient (—1.731) 
of the TRT variable. The 95% confidence interval 
for this HR is calculated to be (0.08, 0.40) indi¬ 
cating a significant preventive effect of treatment. 
These results are similar to those obtained from 
the exponential model in which the estimated haz¬ 
ard ratio was 0.22. 


Comparing Cox and Weibull PH 
models 

Cox: estimate (3 j 
h(t) = h 0 (t) exp((3jTRT) 

/ 

baseline hazard unspecified 


It can be instructive to compare the Cox and 
Weibull PH models. The Cox PH model with 
treatment as the only predictor is stated as 
h 0 (t)exp((3[TRT). There is one parameter to es¬ 
timate (ffy) and the distribution of the baseline 
hazard (ho(t)) remains unspecified. 


Weibull: estimate [3 0 , (3 1 , p 
h(t) = A pt p ~ x where 
A = exp([3 0 + PjTRT). 
h(t) = [exp((3 0 )pf p_1 ]exp((3 1 TRT). 

baseline hazard specified paramet¬ 
rically 


With some manipulation, the Weibull PH model 
can also be expressed as a product of a baseline 
hazard and exp^TRT) (see left). There are three 
parameters to estimate |3 0 , (3,, and p that fully 
specify the hazard. 
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S(t) = exp(-At p ) 


solve for t 
t = [-In S(t)] 1/p x 


1 


let 


1 


A 1/p 


A 1/p 

= exp(ao + oqRX) 


t = [-In S(t)] 1/p x expfcxo + aiTRT) 

/ 


Scaling of t 


An AFT model can also be formulated with the 
Weibull distribution. We develop the AFT parame¬ 
terization similarly to that done with the exponen¬ 
tial model, by solving for t in terms of a fixed S(t). 
The Weibull survival function is S(t) = exp(—At p ). 
Taking the natural log, multiplying by negative 
1, raising to the power 1/p, and then multiply¬ 
ing by the reciprocal of A l/P , yields the expres¬ 
sion for t shown on the left. By reparameterizing 
1 /A 1/p = exp(ao + aiTRT), it can be seen that the 
predictor variable TRT is used to scale the time to 
any fixed value of S(t) (see left). 


Let S(t) = q The expression for t is restated on the left in terms 

of any fixed probability S(t) = q. For example, to 
t = [-ln(q)] 1 ^ x exp( <X() + tX| TRT) find an expression for the median survival time t ra , 

substitute q = 0.5 (see left). 

Median survival time (q = 0.5) 

t m = [—ln(0.5)] 1/p 

x exp(«o + aiTRT) 


Acceleration factor, y (TRT = 
TRT = 0) 

_ [—ln(q)]b p exp(«o + «i) 
y [—ln(p )]!/ p exp(ao) 

= exp(ai) 


vs. The acceleration factor y is obtained as the ratio of 
the times to S(t) = q for TRT = 1 and for TRT = 0. 
After canceling, y reduces to exp(aj). As with the 
PH form of the model, this result depends on p not 
varying by treatment status; otherwise y would 
depend on q. 


Remission Data 


Weibull regression accelerated 
failure-time form 


_t 

Coef. 

Std. Err. 

z 

P>|z| 

trt 

1.267 

.311 

4.08 

0.000 

_cons 

2.248 

.166 

13.55 

0.000 

/In p 

.312 

.147 

2.12 

0.034 

P 

1.366 

.201 



1/p 

.732 

.109 




Output from running a Weibull AFT model is 
shown on the left. The estimates for each form of 
the shape parameter (p, 1/p, and ln(p)) are the 
same as obtained from the previously shown PH 
form of the model. 

The estimated acceleration factor of 3.55 is 
obtained by exponentiating the estimated co¬ 
efficient (1.267) of the TRT variable. The 95% 
confidence interval for y is calculated to be (1.93, 
6.53). 
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Weibull AFT: 

y(TRT = 1 vs. 0) = exp(1.267) 

= 3.55 

95% Cl = exp[1.267± 1.96(0.311)] 
= (1.93,6.53) 


Weibull: y = 3.55 
Exponential: y = 4.60 (assumes 
h(t) = A) 


These results suggest that the median (or any other 
quantile of) survival time is increased by a fac¬ 
tor of 3.55 for those receiving the treatment com¬ 
pared to the placebo. Recall that the acceleration 
factor was estimated at 4.60 using the exponen¬ 
tial model. However, the exponential model uses 
a much stronger assumption: that the hazards are 
constant. 


Relating Weibull AFT and 
PH coefficients 

AFT: A 1/p = exp[—(cxo + oqTRT)] 
(l/p)ln A = -(<Xo + oqTRT) 

In A = — p (ao + aiTRT) 

PH: A = exp(|3 0 + (3,TRT) 

In A = |3 0 + pjTRT 


Corresponding coefficients obtained from the PH 
and AFT forms of the Weibull models are related 
as follows: (3, = — ctj p for the jth covariate. This 
can most easily be seen by formulating the param¬ 
eterization equivalently in terms of ln(A) for both 
the PH and AFT forms of the model as shown on 
the left. 


Relationship of coefficients: 

(3j = — <Xj p so that 

(3 = - a for exponential {p = 1) 

Relating estimates for TRT 
(PH vs. AFT) 

-1.731 =(-1.267)(1.366) 


This relationship is illustrated utilizing the coef¬ 
ficient estimates we obtained for TRT: —1.731 = 
(— 1.267)(1.366). Note for the exponential model 
in which p = 1, the PH and AFT coefficients are 
related, (3 = -a. 


Next: log-logistic model 
• Hazard may be nonmonotonic 
Weibull model 


In the next example the log-logistic model is pre¬ 
sented. In contrast to the Weibull, the hazard 
function for the log-logistic distribution allows for 
some nonmonotonic behavior in the hazard func¬ 
tion. 


• Hazard does not change 
direction 


VII. Log-Logistic Example 


Log-logistic hazard: h(t) = 
(where p > 0 and A > 0) 


~Kpt p 1 
1 + A tP 


The log-logistic distribution accommodates an 
AFT model but not a PH model. Its hazard func¬ 
tion is shown on the left. The shape parameter is 
p{> 0). 
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Shape of hazard function: 

p < 1 hazard decreases over time 
p > 1 hazard first increases and then 
decreases over time (unimodal) 


If p < 1 then the hazard decreases over time. If 
p > 1, however, the hazard increases to a maxi¬ 
mum point and then decreases over time. In this 
case (p > 1), the hazard function is said to be uni¬ 
modal. 


Log-logistic modeling assumptions: 



PO: Odds ratio constant over time 


Unlike the Weibull model, a log-logistic AFT model 
is not a PH model. However, the log-logistic AFT 
model is a proportional odds (PO) model. A pro¬ 
portional odds survival model is a model in 
which the odds ratio is assumed to remain con¬ 
stant over time. This is analogous to a propor¬ 
tional hazard model where the hazard ratio is as¬ 
sumed constant over time. 


Survival odds 

S(t) _ P{T > t) 
(1 - S(t)) ~ P(T < t) 


The survival odds is the odds of surviving beyond 
time t (i.e., S(t)/(1 — S(t)). This is the probability 
of not getting the event by time t divided by the 
probability of getting the event by time t. 


Failure odds by time t 

(1 - Sit)) _ P(T < t ) 
S(t) P{T > t) 


The failure odds is the odds of getting the event by 
time t (i.e., (1 — S(t))/S(t)), which is the reciprocal 
of the survival odds (see left). 


Log-logistic survival and failure 
functions 


The log-logistic survival function (S(t)) and failure 
function (1 - S(t)) are shown on the left. 


S(t) = 


1 

1 + A tP 


1 - S(t) = 


A tP 

1 + A tP 


Failure odds 


1 ~ S(t) (tttu) 

(ttW) 


The failure odds simplifies in a log-logistic model 
to A t p (see left). 


Log-logistic PO model: A log-logistic proportional odds model can be for¬ 

mulated by reparameterizing A in terms of predic- 
• Reparameterize A in terms of tor variables and regression parameters. We come 
Xs and (3 s back to this point later in this section. 
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Log Odds Is Linear with ln(tj 

log(failure odds) = ln(Af p ) 

= ln(A) + p[ln(t)] 

/ t 

Intercept = ln(A) slope = p 


The log of the failure odds is ln(At p ), which can 
be rewritten as ln(A) + p\ In(/)]. In other words, 
the log odds of failure is a linear function of 
the log of time with slope p and intercept ln(A). 
This is a useful result enabling a graphical eval¬ 
uation for the appropriateness of the log-logistic 
distribution. 


Evaluate log-logistic assumption 
graphically 






Plot in 


(i-5(d) 

so 

If log-logistic, t 
with slope = p 


against ln(t) 
ten plot is linear 


The log-logistic assumption can be graphically 
evaluated by plotting ln(l - S(t))/(S(t)) against 
ln(t) where S(t ) are the Kaplan-Meier survival 
estimates. If survival time follows a log-logistic 
distribution, then the resulting plots should be a 
straight line of slope p. 


Alternatively 

• Plot in ( (] S f {[)) ) against ln(t) 

• If log-logistic, then plot is linear 
with slope = — p 


We could alternatively plot the log of the survival 
odds, ln(S(t))/(l — S(t)), against ln(t). If the log- 
logistic assumption is correct the resulting plots 
should be a straight line of slope —p. 


Remission Data 

WBCCAT: white blood cell count 
variable medium = 1 vs. high = 2 


We next consider a different variable from the re¬ 
mission data: a dichotomous variable for white 
blood cell count (WBCCAT) coded medium = 1 
and high = 2. 


In 


$(t) ] 

(l-S(O)J 


plotted against ln(t). 



7 

c/3 a 

a 6 
P 5 
O 5 

hi . 

HH 0 

P 2 

C/3 

1 


0 


0 


1 


2 


3 


4 


Log of time 


On the left is the plot of the log odds of survival 
(obtained from the Kaplan-Meier survival esti¬ 
mates) against the log of time comparing medium 
(WBCCAT = 1) and high (WBCCAT = 2) blood 
cell counts. The points for WBCCAT = 1 lie above 
the points for WBCCAT = 2 indicating that the 
survival odds are higher for those with a medium 
white blood cell count compared to high. The lines 
look reasonably straight and parallel, at least until 
the estimated odds of survival approaches zero. 

If we accept the proposition that the lines look 
straight, then the log-logistic assumption is rea¬ 
sonable. Because the lines look parallel, the pro¬ 
portional odds (PO) assumption is also reason¬ 
able. If the PO assumption holds in a log-logistic 
model then the AFT assumption also holds. 
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The key points from above are: 


Straight lines => Log-logistic a. 

Parallel plots ■=> PO b 

Log-logistic and PO => AFT 

c. 


straight lines support the log-logistic 
assumption, 

parallel curves support the PO 
assumption, and 

If the log-logistic and PO assumptions 
hold, then the AFT assumption also 
holds. 


Log-logistic and Weibull graphical 
approach analogous 

• Check PH for Weibull 

• Check PO for log-logistic 


The graphical evaluation for the log-logistic as¬ 
sumption is analogous to the graphical analysis of 
the Weibull assumption presented in the last sec¬ 
tion, except here the PO assumption rather than 
the PH assumption is evaluated by checking for 
parallel lines. 


AFT log-logistic model 


S(t) = 


1 

1 + A tP 


1 

1 +(A 1/p f)P 


Next we consider an AFT log-logistic model with 
white blood cell count as the only predictor com¬ 
paring WBCCAT = 2 (high count) and WBCCAT 
= 1 (medium count). 


solve for t to obtain 


t 


- 


Sit) 


- 1 


A 1/p 


1 


let —r— = exp(ao + ai WBCCAT) 


A 1/p 


t = 


1 

W) 


n i/p 


- 1 


x exp(a 0 + cti WBCCAT) 

/ 

Scaling of t 


We develop the AFT parameterization by 
solving for t in terms of a fixed S(t). Starting 
with the expression for S(t), taking reciprocals, 
subtracting 1, raising to the power 1/p, and then 
multiplying by the reciprocal of A 1/p , yields the 
expression for t shown on the left. By reparameter¬ 
izing 1/A ,/P = exp(ao + aiWBCCAT), we allow 
the predictor variable WBCCAT to be used for the 
multiplicative scaling of time to any fixed value of 
S(t) (see left). 


Let S(t) = q 
t = [q~ 1 - 1 f/P 

x exp(ao + oc\ WBCCAT) 

Median survival time (q = 0.5): 

t m = [2 - l]V p 

x exp(a 0 + afWBCCAT) 


The expression for t is restated on the left in terms 
of any fixed probability S(t) = q. For example, to 
find an expression for the median survival time t ra , 
substitute q = 0.5 (see left). 
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Acceleration factor, The acceleration factor y is found by taking the 

y (WBCCAT = 2 vs. WBCCAT = 1) ratio of the times to S(t) = q for WBCCAT = 2 

and for WBCCAT = 1. After canceling, y reduces 
_ [q" 1 - l] 1/p exp(« 0 + 2eg) toexpCoq). 

[q -1 - l] 1/p exp(a 0 + lop) 

= exp(ai) 


Log-logistic regression accelerated 
failure-time form 


_t 

Coef. 

Std. Err. 

z 

P>|z| 

wbccat 

-.871 

.296 

-2.94 

0.003 

_cons 

3.495 

.498 

7.09 

0.000 

ln.gam 

-.779 

.164 

-4.73 

0.000 

gamma 

.459 

.0756 




p = 1/(0.459) = 2.18 


The output from running the AFT log-logistic 
model is shown on the left. The coefficient esti¬ 
mate for WBCCAT is —0.871, which is statistically 
significant with a p-value of 0.003 (far right 
column of output). 

Stata provides estimates of the reciprocal of 
p (gamma = 1/p) rather than for p. The estimate 
for gamma is 0.459. Therefore, the estimate lor p 
is 1/(0.459) = 2.18. 


WBCCAT = 2 vs. WBCCAT = 1 
(log-logistic): 


y = exp(—0.871) = 0.42 
95% Cl fory = exp[-0.871 

±1.96(0.296)] 
= (0.23,0.75) 


An estimate of the acceleration factor y compar¬ 
ing WBCCAT = 2 to WBCCAT = 1 is found by ex¬ 
ponentiating the estimate -0.871 of cti to obtain 
0.42. The 95% confidence interval for y is calcu¬ 
lated to be (0.23, 0.75). 


Comparing estimated survival 


$i(f) = S 2 (0.42t) 

/ \ 


Survival 
function for 
WBCCAT = 1 


Survival 
function for 
WBCCAT = 2 


These results suggest that the time for going out of 
remission is “accelerated” for patients with a high 
white blood cell count compared to those with a 
medium count by an estimated factor of 0.42. In 
terms of the survival functions estimated from this 
model, Si(f) = S 2 (0.42t) where Si(t) and S 2 (f) are 
the respective survival functions for patients with 
medium and high blood cell counts. 


Failure odds 


1 -S(t) _ (tw) _ w 

s <*> (t+W) 


The proportional odds form of the log-logistic 
model can also be formulated by reparameteriz¬ 
ing A. Recall that the log-logistic failure odds is 
At p . 


where A = exp(|3 0 + (3] WBCCAT) 
OR (WBCCAT = 2 vs. WBCCAT = 1) 


t p exp(|3 0 ±2|3 t ) 
t p exp(|3 0 + l(3j) 


exp( (3 j) 


By setting A = exp(|3 0 + (3, WBCCAT), an odds ra¬ 
tio comparing WBCCAT = 2 to WBCCAT = 1 can 
be calculated (see left). After canceling, the odds 
ratio reduces to cxp([3 ,). 
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Comparing AFT and PO 
(log-log is tic) 

Relationship of coefficients: 

Pj = ~«i P 


The corresponding coefficients for log-logistic PO 
and AFT models are related by |3j = — a, p for the 
jth covariate. This result is obtained using a simi¬ 
lar argument to that presented for the Weibull ex¬ 
ample in the previous section. 


Since a =—0.871 and p = 2.18 
Then, 

(3j = -(-0.871X2.18) = 1.90 


and 

OR = exp(1.90) = 6.69 


The estimate for oi\ in the AFT model is —0.871 
and the estimate for p is 2.18. Therefore, an 
estimate for (3, can be found by multiplying 
—(—0.871) times 2.18 yielding 1.90. An estimate of 
the odds ratio is found by exponentiating this es¬ 
timate, exp(1.90) = 6.69. (Unfortunately, neither 
Stata nor SAS estimates the proportional odds 
form of the model.) 


VIII. A More General Form of 
the AFT Model 

Exponential: S(t) = exp(—At) 

1 

• AFT Form: — 

= exp(oto + aiTRT) 

• PIT Form: A 

= exp(|3 0 + |3]TRT) 


On the left is a summary of the models discussed 
in the previous sections. These models were 
formulated by reparameterizing the survival 
(and hazard) functions in terms of regression 
parameters and predictor variables. 

An advantage for stating the models in this 
form is that the interpretation and relationships 
between parameters are specific to each distribu¬ 
tion. 


Weibull: S(t) = exp(—A t p ) 

• AFT Form: — 

A /p 

= exp(oto + aiTRT) 

• PIT Form: A 

= exp(|3 0 + |3, TRT) 


ITowever, there are more general ways these 
models could be stated. The Cox PIT model is 
a more general way of stating the proportional 
hazards model. In this section we discuss a more 
general formulation of the AFT model. 


Log-logistic: S(t) = -—-— 

1 + Mp 

• AFT Form: — 

A /p 

= exp(ao + aiWBCCAT) 

• PO Form: A 

= exp(|3 0 + PjWBCCAT) 






Presentation: VIII. A More General Form of the AFT Model 283 


General Form of AFT Model 
(One Predictor) 

ln(T) = a 0 + aiTRT + e 

/ 

random error 


Consider an AFT model with one predictor (TRT) 
in which T represents a random variable for sur¬ 
vival time. The model can be expressed on the log 
scale as shown on the left, where e is random error 
following some distribution. 


With additional parameter 
ln(T) = ao + aiTRT + ae 

/ 

a scales the error 


Some distributions have an additional parameter 
(a) scaling e. The model including this additional 
parameter can be expressed as shown on the left, 
where the random error e is multiplied by a scale 
parameter a. 


If e ~N(0, 1), then 

ln(T) ~ N(/x = ao + aiTRT, sd = cr) 

Similar to linear regression (except 
for inclusion of censorships) 


If e follows a standard normal distribution and 
ln(T) = ao + aiTRT + ae, then ln(T) would fol¬ 
low a normal distribution with mean p. = Oq + 
aiTRT and standard deviation a. For this situ¬ 
ation, the model would look like a standard lin¬ 
ear regression. The key difference between fitting 
this survival model and a standard linear regres¬ 
sion is the inclusion of censored observations 
in the data. 


In general, 

P-in(T) 7 ^ («o + aiTRT), sd / a 

Interpretation of parameters de¬ 
pends on distribution 


In general, for other distributions, the mean of 
ln(T) is not ao + aiTRT and its standard deviation 
is not a . In other words, it should not be assumed 
that the mean of e is 0 and the standard deviation 
is 1. The interpretation of the parameters depends 
on the underlying distribution. 


Let a = —, then 

P 

ln(T) = ao + aiTRT + — e 

P 


Sometimes the model is parameterized using a = 
1 /p. The model can then be restated by replacing 
ae with (1 /p)e. 


Additive model in terms of ln(T) 
but 

multiplicative model in terms of T 


The AFT model is additive on the log scale but 
a multiplicative model with respect to T. 


T = exp ^ao + aiTRT + — 

= exp[(ao + aiTRT)] x exp 



In particular, the model can be expressed in terms 
of T by exponentiating ln(T), as shown on the left. 
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Collapse ao into baseline term 
To = exp(ao)exp^~ej 

so that T = exp(oC|TRT) x T 0 
where To is a random variable for 
TRT = 0 


The model may also be expressed by collapsing the 
intercept into a baseline random term To (see left). 
In this setting To is a random variable representing 
the survival time of the placebo group (TRT = 0). 


AFT model may be expressed in 
terms of T or ln(T) 

Comparing Distributions: T and 
ln(T) 


T 

ln(T) 

Exponential 

Extreme minimum 


value 

Weibull 

Extreme minimum 


value 

Log-logistic 

Logistic 

Lognormal 

Normal 


In summary, an AFT model may be expressed by 
reparameterizing a specific distribution, or may 
be more generally expressed either in terms of a 
random variable T (for survival time), or ln(T). 
If T follows a Weibull distribution then ln(T) fol¬ 
lows a distribution called the extreme minimum 
value distribution (see table on left). Similarly, if 
T follows a log-logistic or lognormal distribution 
then ln(T) follows a logistic or normal distribu¬ 
tion, respectively. The logistic and normal distri¬ 
butions are similarly shaped, and are both sym¬ 
metric about their mean. 


IX. Other Parametric Models In the previous sections we presented examples of 

the exponential, Weibull, and log-logistic models. 
In this section we briefly discuss some other para¬ 
metric survival models. 


Generalized Gamma Model 

• Supported by SAS and Stata 

• S(t), h(t) expressed in terms 
of integrals 

• Contains three parameters 

• Weibull, lognormal are special 
cases 


The generalized gamma model is a parametric 
survival model that is supported by both SAS and 
Stata software. The hazard and survival function 
for this model is complicated and can only be 
expressed in terms of integrals. The generalized 
gamma distribution has three parameters allow¬ 
ing for great flexibility in its shape. The Weibull 
and lognormal distributions are special cases of 
the generalized gamma distribution (see Practice 
Exercises 12 to 14). 


Lognormal Model 

Similar to log-logistic 
Difference: 

Log-logistic: AFT and PO 
Lognormal: AFT but not PO 


The lognormal model also has a relatively com¬ 
plicated hazard and survival function that can only 
be expressed in terms of integrals. The shape of 
the lognormal distribution is very similar to the 
log-logistic distribution and yields similar model 
results. A difference is that although the lognor¬ 
mal model accommodates an accelerated failure 
time model, it is not a proportional odds model. 
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Gompertz Model 

• PH model but not AFT 

• One predictor (TRT) in model: 

h(t) = [exp(yt)] x exp(|3 0 + ^TRT) 

/ 

ho(t) = exp(yt) (parametrically 
specified) 

y > 0 hazard exponentially 
increases with t 
y < 0 hazard exponentially 
decreases with t 
y = 0 constant hazard 

(exponential model) 


Parametric models need not be AFT models. The 
Gompertz model is a parametric proportional 
hazards model but not an AFT model. The model 
can be expressed in a form similar to that of a 
Cox PH model except that the baseline hazard is 
specified as the hazard of a Gompertz distribution 
containing a shape parameter y (see left). 

If y > 0 then the hazard exponentially increases 
over time. If y < 0 then the hazard exponentially 
decreases over time. If y = 0 then the hazard is 
constant and reduces to the exponential model. 


AFT model: multiplicative The AFT model is a multiplicative model (i.e., a 

multiplicative scaling of failure time). It becomes 
T = exp(cxo + ctiTRT + e) an additive model on the log scale (see left side). 

= exp(cxo) x exp(ai) x exp(e) 
but 

additive on log scale: 
ln(T) = a 0 + aiTRT + e 


Additive failure time model 
T = oc o -T aiTRT T - € 

/ 

T rather than log(T) is linear with 
TRT 


An alternative parametric model is to define an 
additive failure time model in terms of T. Con¬ 
sider the model: T = oco + aiTRT + e. Now T, 
rather than ln(T), is expressed as a linear function 
of the regression parameters. Stated in statistical 
language: the log link function is omitted from this 
failure time model. SAS supports such an additive 
failure time model (see Computer Appendix). 


Modeling the Shape Parameter 
(e.g., Weibull and log-logistic) 

Typical Weibull model 
h(t) = A pt?- 1 

where A = exp((3 0 + pjTRT) 

p unaffected by predictors 


Many parametric models contain an extra shape 
(or ancillary) parameter beyond the regression 
parameters. For example, the Weibull and log- 
logistic models contain a shape parameter p. 
Typically, this parameter is considered fixed, un¬ 
affected by changes in the values of predictor 
variables. 
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Alternative Weibull model 
models the ancillary parameter p 

h(t) = A pt p ~ l 

where A = exp(|3 0 + (3, TRT) 
p = exp(6 0 + SjTRT) 

Not a PH or AFT model if 5i / 0 
but still a Weibull model 


An alternative approach is to model the shape pa¬ 
rameter in terms of predictor variables and regres¬ 
sion coefficients. In the Weibull model shown on 
the left, both A and p are modeled as functions of 
treatment status (TRT). If 5i is not equal to zero, 
then the value of p differs by TRT. For that sit¬ 
uation, the PH (and thus the AFT) assumption is 
violated because t p ~ l will not cancel in the hazard 
ratio for TRT (see Practice Exercises 15 to 17). 


Choosing appropriate model 

• Evaluate graphically 
o Exponential 

o Weibull 
o Log-logistic 

• Akaike’s information criterion 
o Compares model fit 

o Uses —2 log likelihood 


Choosing the most appropriate parametric model 
can be difficult. We have provided graphical ap¬ 
proaches for evaluating the appropriateness of 
the exponential, Weibull, and log-logistic models. 
Akaike’s information criterion (AIC) provides 
an approach for comparing the fit of models with 
different underlying distributions, making use of 
the —2 log likelihood statistic (described in Prac¬ 
tice Exercises 11 and 14). 


X. The Parametric Likelihood 

• Function of observed data and 
unknown parameters 

• Based on outcome distribution 
f(t) 

• Censoring complicates survival 
data 

o Right-censored 
o Left-censored 
o Interval-censored 


The likelihood for any parametric model is a func¬ 
tion of the observed data and the models un¬ 
known parameters. The form of the likelihood 
is based on the probability density function f(t) 
of the outcome variable. A complication of sur¬ 
vival data is the possible inclusion of censored 
observations (i.e., observations in which the ex¬ 
act time of the outcome is unobserved). We con¬ 
sider three types of censored observations: right- 
censored, left-censored, and interval-censored. 


Examples of Censored Subjects 


Right-censored: _ I X time 


10 


Left-censored: X 

time 

10 


Interval-censored: 1 X 1 

time 


8 10 


Right-censored. Suppose a subject is lost to 
follow-up after 10 years of observation. The time 
of event is not observed because it happened af¬ 
ter the 10th year. This subject is right-censored at 
10 years because the event happened to the right 
of 10 on the time line (i.e., t > 10). 

Left-censored. Suppose a subject had an event be¬ 
fore the 10th year but the exact time of the event is 
unknown. This subject is left-censored at 10 years 
(i.e., t < 10). 

Interval-censored. Suppose a subject had an 
event between the 8th and 10th year (exact time 
unknown). This subject is interval-censored (i.e., 
8 <t< 10). 
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Formulating the Likelihood 


Barry, Gary, Larry, ..., Outcome 
Distribution f(t) 


Subject 

Event 

Time 

Likelihood 

Contribution 

Barry 

t = 2 

m 

Gary 

t > 8 

(right- censored) 

! fm 

8 

Harry 

t = 6 

f (6) 

Carrie 

t < 2 

(left-censored) 

/ fm 

0 

Larry 

4 < t < 8 

(interval-censored) 

/ fm 

4 


The table on the left illustrates how the likelihood 
is formulated for data on five subjects. We assume 
a probability density function f(t) for the outcome. 
Barry gets the event at time t = 2. His contribu¬ 
tion to the likelihood is f(2). Gary is right-censored 
at t = 8. The probability that Gary gets the event 
after t = 8 is found by integrating f(t) from 8 to in¬ 
finity. This is Gary’s contribution to the likelihood. 
Harry gets the event at time t = 6. His contribu¬ 
tion to the likelihood is f(6). Carrie is left-censored 
at t = 2. Her contribution to the likelihood is ob¬ 
tained by integrating f(t) from zero to 2. Finally, 
Larry is interval-censored between t = 4 and t = 
8. His contribution to the likelihood is found by 
integrating f(t) from 4 to 8. 


Likelihood (L) 

Product of individual contributions 

OO 

L = f (2) x J f (t)dt x f (6) 


The full likelihood (L) is found by taking the prod¬ 
uct of each subject's independent contribution to 
the likelihood. The likelihood for this example is 
shown on the left. 


x 



f{t)dt 


(Barry x Gary x Harry 
x Carrie x Larry) 


Assumptions for formulating L 

• No competing risks 

o Competing event does not 
prohibit event of interest 
o Death of all causes is 
classic example of no 
competing risk 


The formulation of this likelihood uses the as¬ 
sumption of no competing risks. In other words, 
we assume that no competing event will prohibit 
any subject from eventually getting the event of 
interest (see Chapter 9). Death from all causes is 
the classic example of an outcome that in reality 
has no competing risk. For other outcomes, the no 
competing risk assumption is more of a theoreti¬ 
cal construct. 


• Subjects independent 

o Allows L to be formulated as 
product of subjects’ 
contributions 


Another assumption is that individual contri¬ 
butions to the likelihood are independent. This 
assumption allows the full likelihood to be formu¬ 
lated as the product of each individual's contribu¬ 
tion. 
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• Follow-up time continuous 
o No gaps in follow-up 


Revisit example with Barry, Gary, 

Larry,... 

f(t) is Weibull 

SMOKE is only predictor 

1 = Smoker 
0 = Nonsmoker 

Weibull: h(t) = A pt p ~ l , 

S(t) = exp(-At p ) 

fit) = Ht)S(t) 
fit) = A pt p ~ x exp(—At p ) 
where A = exp(|3 0 + (3 { SMOKE) 

(PH form of the model) 


Data Layout for Right-, Left-, and 
Interval-Censoring Using SAS 


ID 

LOWER 

UPPER 

SMOKE 

Barry 

2 

2 

1 

Gary 

8 

— 

0 

Harry 

6 

6 

0 

Carrie 

— 

2 

0 

Larry 

4 

8 

1 


Right-censored: UPPER missing 
Left-censored: LOWER missing 
Interval-censored: LOWER < 
UPPER 

Not censored: LOWER = UPPER 


A third assumption is that each subject’s follow-up 
time is continuous without gaps (i.e., once sub¬ 
jects are out of the study, they do not return). If 
gaps are allowed, the likelihood can be modified 
to accommodate such a scenario. 

In the last example, we did not specify the prob¬ 
ability density f(t), nor did we specify any co¬ 
variates. We revisit this example, assuming f(t) is 
Weibull with one predictor SMOKE in the model 
(coded 1 for smokers and 0 for nonsmokers). 


The Weibull hazard and survival functions are 
shown on the left. The probability density func¬ 
tion f(t) is the product of the hazard and survival 
functions. The parameterization will use the pro¬ 
portional hazards (PH) form of the Weibull model: 
A = |3 0 + fySMOKE. 


On the left is the data layout for running paramet¬ 
ric models containing right-, left-, and interval- 
censored data in a form suitable for using the 
SAS procedure PROC LIFETEST (version 8.2). 
There are two time variables LOWER and UPPER. 
Barry got the event at t = 2, so both LOWER and 
UPPER get the value 2. Gary was right-censored 
at 8 (t >8) so LOWER gets the value 8 and UP¬ 
PER is set to missing. Carrie is left-censored at 2 
(t < 2) so LOWER is set to missing and UPPER 
gets the value 2. Larry was interval-censored with 
LOWER = 4 and UPPER = 8. Barry and Larry 
are smokers whereas Gary, Harry, and Carrie are 
nonsmokers. 
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Weibull Likelihood (L) 


Product of individual contributions 

oo 2 

L = f(2) x J f(t)dt x f(6) x J f(t)dt 
8 0 


8 

X J f(t)dt 

4 

L = exp(p 0 + Pi)p(2) p_1 exp(—exp((3o + (3i)2 p ) 

OO 

x J exp(po)p(O p 1 exp(—exp(3 0 )f p )<if 


The full likelihood using the Weibull distribution 
can now be formulated as a product of each indi¬ 
vidual’s contribution (shown on the left). We have 
used a small dataset (5 subjects) for ease of illus¬ 
tration but the process can be generalized for any 
number of subjects. 


x exp(|3 0 )p(6) p 1 exp(-exp(Po)6 p ) 

2 

x J exp(|3 0 )p(f) p-1 exp(-exp((3o)f p )df 
o 

8 

x J exp(|3 0 + |3i)/?(f) p_1 exp(—exp(|3 0 

4 

+ (3 i )t p )dt 


Obtaining maximum likelihood es¬ 
timates 


Solve system of equations: 


3 Ln(L) 
9(3 j 


= 0 j = 1,2,..., N 


where N = # of parameters 


Once the likelihood is formulated, the question 
becomes: which values of the regression parame¬ 
ters would maximize L? The process of maximizing 
the likelihood is typically carried out by setting the 
partial derivative of the natural log of L to zero and 
then solving the system of equations (called the 
score equations). The parameter estimates (e.g., 
p, (3 0 , (3 [) that maximize L are the maximum like¬ 
lihood estimates. 


XI. Interval-Censored Data 

Parametric likelihood 

• Handles right-, left-, or 
interval-censored data 

Cox likelihood 


One advantage of a parametric model compared 
to a Cox model is that the parametric likeli¬ 
hood easily accommodates right-, left-, or interval- 
censored data. The Cox likelihood, by contrast, 
easily handles right-censored data but does not 
directly accommodate left- or interval-censored 
data. 




Designed to handle 
right-censored data. 
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Interval-censored study design 

• Check for nonsymptomatic 
outcome once a year 

• If outcome newly detected, 
exact time occurred during 
previous year 

• Left-censoring special case of 
interval-censoring 

o Zero the lower boundary of 
the interval 

Parametric model can be fitted 

• f(t) specified 

• Contribution to likelihood for 
each subject 

o Integrate f(t) over event 
interval 

Binary regression 

• Alternative approach 

for interval-censored data 

• Outcome coded 

o 0 if subject survives interval 
o 1 if subject gets event during 
interval 

• Useful approach if 

o Ample number of events in 
each interval 

o Prefer not to specify f(t) 


Sometimes the design of a study is such that all 
the data are interval-censored. For example, con¬ 
sider a study in which healthcare workers exam¬ 
ine subjects once a year, checking for a nonsymp¬ 
tomatic outcome. If an event was first detected 
in the beginning of the third year, then the ex¬ 
act time of the outcome occurred sometime be¬ 
tween the second and third years. In this frame¬ 
work left-censoring can be considered a special 
case of interval-censoring with zero as the lower 
boundary of the interval. 

A parametric model can easily be fitted using the 
methods described in the previous section. Once a 
distribution for the outcome, f(t), is specified, each 
subjects contribution to the likelihood is obtained 
by integrating f(t) over the interval in which he or 
she had the event. 


A binary regression (e.g., logistic regression) is 
an alternative approach that may be considered 
if all the data are interval-censored. With this 
method the outcome variable can be coded zero 
if the subject survives the interval and coded one 
if the subject gets the event during the interval. 
This approach is particularly useful if there are an 
ample number of events in each interval and the 
analyst prefers not to specify a distribution f(t) for 
continuous survival time. 


Information on Three Subjects 

Subject 1: Gets event in first interval 

Subject 2: Survives first interval 

Survives second interval 
Gets event in third 
interval 

Subject 3: Survives first interval 
Gets event in second 
interval 


For illustration, consider a small dataset contain¬ 
ing three subjects. Subject 1 gets the event in the 
first interval of follow-up, subject 2 gets the event 
in the third interval, and subject 3 gets the event 
in the second interval of follow-up. 
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Data Layout for Binary Regression 


SUBJECT EVENT Dj D 2 D 3 TRT 


1 

1 

1 

0 

0 

1 

2 

0 

1 

0 

0 

0 

2 

0 

0 

1 

0 

0 

2 

1 

0 

0 

1 

0 

3 

0 

1 

0 

0 

1 

3 

1 

0 

1 

0 

1 


EVENT: dichotomous outcome 
coded 1 if event, 0 for no event 
during the interval 


Di, D 2 , D 3 : dummy variables for 
intervals 1,2, and 3 coded 1 if in the 
corresponding interval, 0 otherwise 

TRT: Treatment coded 1 for new 
treatment, 0 for placebo 


The data layout is shown on the left. Each 
observation represents one interval of follow-up 
time allowing multiple observations per subject. 
EVENT is the dichotomous outcome variable. 
Subject 1 had the event in the first interval 
(EVENT = 1) and thus has one observation. 
Subject 2 has three observations because she 
survived the first two intervals (EVENT = 0) but 
got the event in the third interval. Dj is a dummy 
variable coded 1 if the observation represents 
the first interval and 0 otherwise. Similarly, D 2 is 
coded 1 for the second interval and D 3 is coded 1 
for the third interval. 

TRT is the predictor of interest, coded 1 for 
the new treatment and 0 for the placebo. TRT 
could be coded as a time-independent or time- 
dependent variable. In this example, TRT is 
time-independent because TRT does not change 
values over different intervals corresponding to 
the same subject. 


Logistic Model 

Logit P(Y = 1)= (3,Di + (3 2 D 2 

+ (3 3 D 3 + (3 4 TRT 


A logistic model (shown at left) containing the 
three dummy variables and TRT can be formu¬ 
lated with the data in this form. 


where P(Y = 1) is the probability 
of event for a given interval condi¬ 
tioned on survival of previous inter¬ 
vals 


Interpretation of Parameters 

|3 1 : Log odds of event in 1 st interval 
among TRT = 0 

|3 2 : Log odds of event in 2nd inter¬ 
val given survival of 1st interval 
among TRT = 0 

(3 3 : Log odds of event in 3rd interval 
given survival of first two inter¬ 
vals among TRT = 0 

[3 4 : Log odds ratio for TRT 


Care must be taken with the interpretation of the 
parameters: (3, is the log odds of the event occur¬ 
ring in the first interval among the placebo group; 
|3 2 is the log odds of the event occurring in the sec¬ 
ond interval conditioned on survival of the first in¬ 
terval among the placebo group; |3 3 is the log odds 
of the event occurring in the third interval condi¬ 
tioned on survival of the first and second intervals 
among the placebo group; and |3 4 is the log odds 
ratio for TRT. 
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Di, D 2 , D 3 play similar role as inter¬ 
cept 

• Baseline measure when 
covariates are zero 

• 3 parameters rather than 1 
intercept 

o Baseline measure may differ 
for each interval 

Odds Ratio (TRT = 1 vs. TRT = 0) 
= exp((3 4 ) 

Model uses PO assumption 

• OR constant over time 

• PO assumption can be tested 
o Include interaction terms 

with TRT and dummy 
variables 

o Significant interaction 
suggests PO violation 
o Need ample data to 
practically carry out test 

Alternative Binary Model 

log(-log(l - P(Y = 1))) 

= |3jDi + (3 2 D 2 + 33 D 3 + (3 4 TRT 

where 1 — P(Y = 1) is the probabil¬ 
ity of surviving a given interval con¬ 
ditioned on survival of previous in¬ 
tervals 

Complementary log-log link 

• Log-log survival modeled as 
linear function of regression 
parameters 

Logit link 

• Log odds of failure modeled 

as linear function of regression 
parameters 


The dummy variables play a similar role to that 
of the intercept in a conventional regression, pro¬ 
viding a baseline outcome measure for the case in 
which all predictors are zero (e.g., TRT = 0). In 
general, the baseline measure may differ for each 
interval, which is the reason that the model con¬ 
tains 3 dummy variables rather than 1 intercept. 


The odds ratio comparing TRT = 1 to TRT = 0 is 
obtained by exponentiating |3 4 . This model uses 
the proportional odds (PO) assumption in that 
the odds ratio is assumed constant over time (or 
at least constant at the end of each interval). This 
assumption can be tested by including interaction 
(product) terms with TRT and two of the dummy 
variables in the model. A statistically significant 
product term would suggest a violation of the PO 
assumption. However, if there are sparse data cor¬ 
responding to particular intervals, it will not be 
practical to carry out such a test on those inter¬ 
vals. 


Logistic regression is not the only type of binary 
regression that may be considered for interval- 
censored data. An alternative binary model 
(shown on the left) uses the complementary log- 
log link function rather than the logit link func¬ 
tion that is used for the more familiar logistic 
regression. 


A model using a complementary log-log link func¬ 
tion expresses the log negative log survival prob¬ 
ability as a linear function of regression param¬ 
eters. By contrast, a model using a logit link 
function expresses the log odds of failure (i.e., get¬ 
ting the event) as a linear function of regression 
parameters. 
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Complementary log-log model is The complementary log-log binary model is a 
PH model proportional hazards model. The hazard ratio 

comparing TRT = 1 to TRT = 0 is obtained by 

• HR (TRT = 1 vs. TRT = 0) = exponentiating |3 4 . 
exp(|3 4 ) 

• HR constant over time 


Log-log survival curves: 
parallel i=> additive effects 

c=> PH 

Complementary log-log link: 
additive effects on log-log scale 

c=> PH 


Recall we can use log-log survival curves to eval¬ 
uate the PH assumption for a Cox model. If the 
effects are additive (e.g., parallel for TRT = 1 and 
TRT = 0) then the PH assumption is assumed to 
hold. The underlying idea is similar for the com¬ 
plementary log-log link function in that additive 
effects are assumed on the log-log scale (e.g., com¬ 
paring TRT = 1 to TRT = 0). 


In theory 

• Survival time is continuous 
In practice 

• Survival time measured in 
intervals 

o If event occured in month 7 
then event occurred in an 
interval of time 


In theory, the time-to-event variable in survival 
analyses is thought of as a continuous variable. 
In practice, however, the time variable is typically 
an interval of time. For example, if time is mea¬ 
sured in months and an event occurs in month 7 
then the event is recorded as having occurred in a 
specific interval lasting a month. 


Discrete survival analysis 

• Discrete time 

• For example, number of 
menstrual cycles to pregnancy 
rather than time to pregnancy 
o Fraction of cycle does not 

make sense 


Discrete survival analysis is a survival analysis 
in which the outcome variable is discrete, both in 
theory and in practice. For example, consider a 
study in which women who stop using oral con¬ 
traception are followed until pregnancy. The out¬ 
come is defined as the number of menstrual cy¬ 
cles until pregnancy. The number of cycles rather 
than the time to pregnancy is used because the 
cycle length varies among women and a woman 
ovulates only once per menstrual cycle (i.e., one 
opportunity per cycle to become pregnant). The 
number of cycles is a discrete outcome. A fraction 
of a cycle does not make sense. 
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Analyzing discrete survival data 

• Can use binary regression 

• Analogous to interval-censored 
data 

o Discrete outcome—subjects 
survive discrete units of time 
o Interval outcomes— 
subjects survive intervals 
of time 


Binary regression, as described in this section, 
can be applied for discrete survival outcomes in 
a similar manner to that described for interval- 
censored outcomes. With this method, subjects 
can be conceptualized as surviving discrete units 
of time analogously as subjects surviving continu¬ 
ous intervals of time. 


XII. Frailty Models 

Frailty 

• Random component 

• Accounts for extra variability 
from unobserved factors 


In this section we consider the inclusion of frailty 
to a survival model. Frailty is a random compo¬ 
nent designed to account for variability due to 
unobserved individual-level factors that is other¬ 
wise unaccounted for by the other predictors in 
the model. 


Conceptualize S(t) two ways: 

• For an individual 

• Averaging over a theoretical 
large population 


Consider a survival model with a continuous age 
variable and dichotomous smoking status variable 
as the only predictors. Under this model the sur¬ 
vival function for a 33-year-old smoker might be 
conceptualized in different ways. One way is as 
the survival function for an individual 33-year-old 
smoker. The second way is as some kind of averag¬ 
ing over a theoretical large population of 33-year- 
old smokers. 


With Frailty Component 

Jake and Blake 

1. May have different S(t) due to 
unobserved factors 

2. Extra source of variability in 
outcome (e.g., more variation 
than expected under Weibull) 

Without Frailty Component 

Jake and Blake 

1. Have same S(t) 

2. May have different event times 
because event time is random, 
following some distribution (e.g., 
Weibull) 


Now suppose a “frailty” component is included in 
the model. Under this model, we can conceptu¬ 
alize survival functions specific to each individ¬ 
ual. If Jake and Blake are both 33-year-old smok¬ 
ers, not only might their observed failure times 
be different, but under this model their individ¬ 
ual survival functions could also be different. 
Jake may be more “frail” than Blake due to un¬ 
observed factors accounting for individual level 
differences in his hazard and survival functions. 
These unobserved factors may contribute an ex¬ 
tra layer of heterogeneity, leading to greater vari¬ 
ability in survival times than might be expected 
under the model (e.g., Weibull) without the frailty 
component. 
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The frailty component a (a > 0) 

• Unobserved multiplicative 
effect on hazard 

• Follows distribution g( a) with 

p = 1 

• Var(g(a)) = 6, parameter to be 
estimated 

Hazard and survival conditioned on 
frailty 

h(t|q) = qh(t) 

S(t|a) = S(t)“ 


q > 1 

• Increased hazard: qh(t) > h(t) 

• Decreased survival: S(t)“ < S(t) 

q < 1 

• Decreased hazard: ah(t) < h(t) 

• Increases survival: S(t)“ > S(t) 

a = 1 (average frailty): ah(t) = h(t) 

Survival functions 
(with frailty models) 

1. Conditional, S(t|a), individual 
level 

2. Unconditional, Su(t), population 
level 

Unconditional survival function 
Su(t) 

oo 

Su(t) = J S(t | q)g(q)dq 
o 

-d[Su(t)]/dt 

M0 = s„(t) 


The frailty a is an unobserved multiplicative ef¬ 
fect on the hazard function assumed to follow 
some distribution g(a) with q > 0 and the mean 
of g( q) equal to 1. The variance of g( q) is a param¬ 
eter 6 (theta) that is typically estimated from the 
data. 


An individual’s hazard function conditional on the 
frailty can be expressed as q multiplied by h(t). 
Using the relationship between the survival and 
hazard functions, the corresponding conditional 
survival function can be expressed as S(t) raised 
to the q power. 

Individuals with q > 1 have an increased hazard 
and decreased probability of survival compared to 
those of average frailty (q = 1). Similarly, individ¬ 
uals with q < 1 have a decreased hazard and in¬ 
creased probability of survival compared to those 
of average frailty. 


With frailty models, we distinguish the individ¬ 
ual level or conditional survival function S(t| q) 
discussed above, from the population level or un¬ 
conditional survival function Su(t), which repre¬ 
sents a population average. Once the frailty distri¬ 
bution g(q) is chosen, the unconditional survival 
function is found by integrating over the condi¬ 
tional survival function S(t|q) times g(q), with 
respect to q. The corresponding unconditional 
hazard hj(t) can then be found using the relation¬ 
ship between the survival and hazard functions 
(see left). 
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Frailty distribution g(a), a > 0, 
E(a) = 1 

Stata offers choices for g(a) 

1. Gamma 

2. Inverse-Gaussian 

Both distributions parameterized in 
terms of 0 


Any distribution for a > 0 with a mean of 1 
can theoretically be used for the distribution of 
the frailty. Stata supports two distributions: the 
gamma distribution and the inverse-Gaussian 
distribution for the frailty. With the mean fixed 
at f, both these distributions are parameterized in 
terms of the variance 6 and typically yield similar 
results. 


EXAMPLE 


Vet Lung Cancer Trial 
Predictors: 

TX (dichotomous: 1 = standard, 2 = test) 
PERF (continuous: 0 = worst, 100 = best) 
DD (disease duration in months) 

AGE (in years) 

PRIORTX (dichotomous: 0 = none, 

10 = some) 

Model 1. No Frailty 


Weibull regression (PH form) 
Log likelihood = —206.20418 


_t 

Coef. 

Std. Err. 

z 

p>|z| 

tx 

.137 

.181 

0.76 

0.450 

perf 

-.034 

.005 

-6.43 

0.000 

dd 

.003 

.007 

0.32 

0.746 

age 

-.001 

.009 

-0.09 

0.927 

priortx 

-.013 

.022 

-0.57 

0.566 

_cons 

-2.758 

.742 

-3.72 

0.000 

/ln p 

-.018 

.065 

-0.27 

0.786 

P 

.982 

.064 



1/p 

1.02 

.066 




To illustrate the use of a frailty model, we apply 
the data from the Veteran's Administration Lung 
Cancer Trial described in Chapter 5. The exposure 
of interest is treatment status TX (standard = 1, 
test = 2). The control variables are performance 
status (PERF), disease duration (DD), AGE, and 
prior therapy (PRIORTX), whose coding is shown 
on the left. The outcome is time to death (in days). 


Output from running a Weibull PH model with¬ 
out frailty using Stata software is shown on the 
left (Model 1). The model can be expressed: h(t) = 
A pt p ~ l where 

A = exp((3 0 + (SjTX + (3 2 PERF + (3 3 DD 
+ (3 4 AGE + (3 5 PRIORTX). 

The estimate of the hazard ratio comparing TX = 2 
vs. TX = 1 is exp(0.137) = 1.15 controlling for per¬ 
formance status, disease duration, age, and prior 
therapy. The estimate for the shape parameter is 
0.982 suggesting a slightly decreasing hazard over 
time. 
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EXAMPLE (continued) 

Model 2. With Frailty 



Weibull regression (PH form) 



Gamma frailty 
Log likelihood = 

-200.11338 



_t 

Coef. 

Std. Err. 

z 

p>|z| 

tx 

.105 

.291 

0.36 

0.719 

perf 

-.061 

.012 

-5.00 

0.000 

dd 

-.006 

.017 

-0.44 

0.663 

age 

-.013 

.015 

-0.87 

0.385 

priortx 

-.006 

.035 

-0.18 

0.859 

_cons 

-2.256 

1.100 

-2.05 

0.040 

/ln_p 

.435 

.141 

3.09 

0.002 

/ln.the 

-.150 

.382 

-0.39 

0.695 

P 

1.54 

.217 



1/p 

.647 

.091 



theta 

.861 

.329 



Likelihood ratio test of theta = 
chibar2(01)= 12.18 

= 0: 


Prob>= 

:chibar2 = 

0.000 




Model 2 (output on left) is the same Weibull model 
as Model 1 except that a frailty component has 
been included. The frailty in Model 2 is assumed 
to follow a gamma distribution with mean 1 and 
variance equal to theta (0). The estimate of theta 
is 0.861 (bottom row of output). A variance of 
zero (theta = 0) would indicate that the frailty 
component does not contribute to the model. A 
likelihood ratio test for the hypothesis theta = 0 
is shown directly below the parameter estimates 
and indicates a chi-square value of 12.18 with 1 
degree of freedom yielding a highly significant 
p-value of 0.000 (rounded to 3 decimals). 

Notice how all the parameter estimates 
are altered with the inclusion of the frailty. 

The estimate for the shape parameter is now 1.54, 
quite different from the estimate 0.982 obtained 
from Model 1. The inclusion of frailty not only 
has an impact on the parameter estimates but 
also complicates their interpretation. 


Comparing Model 2 with Model 1 

• There is one additional 
parameter to estimate in 
Model 2 

• The actual values of 
individuals’ frailty are not 
estimated in Model 2 

• The coefficients for the 
predictor variables in Models 1 
and 2 have different estimates 
and interpretation 

• The estimate of the shape 
parameter is <1.0 for Model 1 
and > 1.0 for Model 2 


Before discussing in detail how the inclusion 
of frailty influences the interpretation of the 
parameters, we overview some of the key points 
(listed on the left) that differentiate Model 2 
(containing the frailty) and Model 1. 

Model 2 contains one additional parameter, 
the variance of the frailty. However, the actual 
values of each subject’s frailty are not estimated. 
The regression coefficients and Weibull shape 
parameter also differ in their interpretations for 
Model 2 compared to Model 1. We now elaborate 
on these points. 
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Model 2 

Hazard for jth individual: 


For Model 2 we can express the Weibull model 
with a gamma distributed frailty in terms of the 
individual level hazard for the jth subject. 


hj(t|<Xj) = a.jh(t) j = f, 2,..., n 


where h(t) = A pt v 1 
with A = exp(|3 0 + (3[TX 

+ (3 2 PERF + (3 3 DD 

+ |3 4 AGE + |3 5 PRfORTX) 
and where a ~ gamma (/x = 1, 
variance = 0) 


If a.j denotes the frailty for the jth subject, 
then that subject’s hazard hj(t | cxj) can be ex¬ 
pressed as <x.j multiplied by h(t), where h(t) is the 
Weibull hazard function parameterized in terms 
of the predictor variables and their regression 
coefficients (see left). 


a.j not estimable 

• An <Xj associated with each 
subject 

• Too many parameters 


The values for each a, are not estimable because 
there is a level of frailty associated with each data 
point. If we tried to estimate each subject s frailty, 
then there would be more parameters to estimate 
than observations in the dataset and the model 
would be overparameterized. 


Rather, var[g(a)] is estimated 
• Gamma is 2-parameter 
distribution 
o Mean set at 1.0 
o 9 = Var[g(a)] is estimated 


Rather, the variance of the frailty is estimated. The 
gamma distribution is a two-parameter distribu¬ 
tion. Because the mean is set at 1, we need only 
estimate its variance to fully specify the frailty dis¬ 
tribution. 


Interpreting coefficients in Model 2 

HR = exp(|3j) = 1.11 

Estimates HR comparing two indi¬ 
viduals 

• With same oc 

• One with TX = 2, other with 
TX = 1 

• With same levels of other 
predictors 


The estimated coefficient for TX using Model 2 is 
0.105. By exponentiating, we obtain exp(0.105) = 
1.11. This is the estimated hazard ratio for two 
individuals having the same frailty in which one 
takes the test treatment and the other takes the 
standard treatment controlling for the other co¬ 
variates in the model. Thus, for two individuals 
with the same frailty, we can use the coefficient 
estimates from Model 2 to estimate the ratio of 
conditional hazards. 
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Recall: h(t|<x) = cth(t) 

TX = 1: hi(t|<xi) = a.ihi(t) 
TX = 2: h 2 (t|<x 2 ) = <X 2 hi(t) 

If pfv = exp(Pj) 
n\{t) 

+v Wn(0 , Q ^ 

then ^7T7X = ex P( P i) 
only if ai = ot 2 


To clarify, recall that the individual level or condi¬ 
tional hazard function can be expressed as a mul¬ 
tiplied by h(t). Suppose hi(t| cx_i) and h 2 (t| <x 2 ) are 
the conditional hazard functions for individuals 
who use the standard and test treatments, respec¬ 
tively, at the mean levels of the other covariates. If 
the ratio of h 2 (t) and h i (t) equals exp( |3 { ), then the 
ratio of h 2 (t|a 2 ) and hi(t|ai) equals exp^) only 
if the individuals have the same level of frailty (i.e., 
a.i = 0 C 2 ; see left). 


Another interpretation for exp((3i) 

• Ratio of conditional hazards 
from the same individual 

• Effect for individual taking test 
rather than standard treatment 


Another way to interpret the exponentiated coeffi¬ 
cient for TRT, cxp((3 |), is as a ratio of conditional 
hazards from the same individual. This measure 
can be used to estimate an effect for an individual 
taking the test treatment instead of the standard 
treatment. 


Model 1 (p = 0.982) 

Decreasing hazard for individual 
and population because p < 1 

Model 2 (p = 1.54) 

Complication: 

Individual level hazard 
vs 

Population level hazard 


A somewhat striking difference in the output from 
Model 1 and Model 2 is the estimate of the shape 
parameter. The hazard estimated from Model 1 
(without the frailty) is estimated to decrease over 
time because p < 1. By contrast, the estimated in¬ 
dividual level hazard from Model 2 is estimated to 
increase over time because p > 1. However, the 
interpretation of the shape parameter in Model 2 
has an additional complication that should be con¬ 
sidered before making direct comparisons with 
Model 1. For frailty models, we have to distin¬ 
guish between the individual level and popula¬ 
tion level hazards. 


For Model 2 

Conditional hazard increases 
but 

unconditional hazard unimodal 


Although the estimated individual level or con¬ 
ditional hazard is estimated to increase from 
Model 2, the estimated population level or un¬ 
conditional hazard does not strictly increase. The 
unconditional hazard first increases but then de¬ 
creases to zero, resulting in a unimodal shape due 
to the effect of the frailty. 
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Estimated unconditional hazard 
Model 2 (TX = 1, mean level for 
other covariates, p = 1.54) 



l 

analysis time 
Weibull regression 


On the left is a plot (from Model 2) of the 
estimated unconditional hazard for those on stan¬ 
dard treatment (TX = 1) with mean values for 
the other covariates. The graph is unimodal, with 
the hazard first increasing and then decreasing 
over time. So each individual has an estimated 
increasing hazard (p = 1.54), yet the hazard 
averaged over the population is unimodal, rather 
than increasing. How can this be? 

The answer is that the population is com¬ 
prised of individuals with different levels of 
frailty. The more frail individuals (a > 1) have 
a greater hazard and are more likely to get the 
event earlier. Consequently, over time, the “at risk 
group” has an increasing proportion of less frail 
individuals (a < 1), decreasing the population 
average, or unconditional, hazard. 


Four increasing individual level 
hazards, but average hazard de¬ 
creases from ti to t 2 



To clarify the above explanation, consider the 
graph on the left in which the hazards for four 
individuals increase linearly over time until their 
event occurs. The two individuals with the high¬ 
est hazards failed between times ti and t 2 and the 
other two failed after t 2 . Consequently, the aver¬ 
age hazard (h 2 ) of the two individuals still at risk 
at t 2 is less than the average hazard (h j) of the four 
individuals at risk at ti. Thus the average hazard 
of the “at risk” population decreased from ti to 
t 2 (i.e., h 2 < hi) because the individuals surviving 
past t 2 were less frail than the two individuals who 
failed earlier. 


Frailty Effect 

hu(t) eventually decreases 
because 

“at risk group” becoming less frail 
over time 


This property, in which the unconditional hazard 
eventually decreases over time because the “at risk 
group” has an increasing proportion of less frail 
individuals, is called the frailty effect. 
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Unconditional hazard hu(t) with The unconditional hazard function hu(t), with 
gamma frailty gamma frailty is shown on the left. 


hu(t) = 


Ht) 

1 - 6 ln[S(t)] 


If 6 = 0 then hj(t) = h(t) If 6 = 0, then hj(t) reduces to h(t) indicating that 

(no frailty) there is no frailty. 


For Model 2: 

• h(t) and S(t) are Weibull 

• At t = 0 

o hu(t) = h(t) (increasing) 

• As t gets large 

o If 9 > 0 then h u (t) -> 0 

• So hy(t) increases and then 
decreases (unimodal) 


An examination of the expression for h u (t) gives 
further insight into how we obtained an estimated 
unconditional hazard of unimodal shape. S(t) and 
h(t) represent the survival and hazard functions ig¬ 
noring the frailty, which for Model 2 corresponds 
to a Weibull distribution. If t = 0 then h y (t) = h(t), 
which for Model 2 yields an estimated increasing 
hazard. As t gets larger, and if 9 > 0, the denomi¬ 
nator gets larger (because ln[S(t)] is negative) until 
eventually hj(t) approaches zero. So h.j(t) is in¬ 
creasing at t = 0 but eventually decreases to zero, 
which means at some point in time, h y (t) changes 
direction. 


Population level hazards (with 
gamma frailty) 


v ^ h x {t) 
ul() 1 — 0 In 

i _ ^h) 

jA) ~ \-e ln[S 2 (t)] 


for TX= 1 
for TX = 2 


Ratio of unconditional hazards (not 

PH) 


hu 2 (t) _h 2 (t) 1 - 6»ln[Si(t)] 

h vl (t) ~ h x {t) X \ — 0 ln[S 2 (f)] 


A consequence of the frailty effect is the need to 
distinguish between the ratio of individual level 
hazards and the ratio of population level haz¬ 
ards. For the population level hazards, the PH as¬ 
sumption is violated when a gamma (or inverse- 
Gaussian) distributed frailty is added to a PH 
model. To see this for gamma frailty, let hui(t) and 
h u2 (t) be the unconditional hazard functions rep¬ 
resenting the standard and test treatments, respec¬ 
tively, at the mean levels of the other covariates. 
The ratio of these hazards is shown on the left. 
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If 


h 2 (f) . 

r — = expOO 

then 

huiit) (a s. 1 -6>ln[Si(t)] 

Si =exP(^,-„!„[«,)] 

not constant over time, 

PH violated 


If the ratio of I 12 (t) and hi(t) equals cxp((3, 
then the ratio of the unconditional hazards equals 
expCPj) times the ratio of 1 — 6 ln[Si(f)] and 1 — 
0 ln[S 2 (t)]- This latter ratio is a function of time 
and only cancels when t equals zero. Therefore the 
ratio of the unconditional hazards is not constant 
over time, thus violating the PH assumption. 


Plots of S(t) 

• Generally averaged over 
population 

o An important consideration 
for frailty models 


Generally, survival plots are estimated over a pop¬ 
ulation average (e.g., Kaplan-Meier). When con¬ 
sidering PH models without frailty, we do not need 
to distinguish between the conditional and uncon¬ 
ditional survival functions. However, this distinc¬ 
tion needs to be considered with frailty models. 


Suppose ln[-ln S(t)] curves for TX 
start parallel but then converge over 
time: 

1. It may be effect of TX weakens 
over time jj 

PH model not appropriate 

2. It may be effect of TX is constant 
over time but unobserved 
heterogeneity is in 
population jj 

PH model with frailty is 
appropriate 

Model 2 (Weibull with frailty) 

• Used PH parameterization 

• Can equivalently use AFT 
parameterization 


Suppose we plot Kaplan-Meier log-log survival es¬ 
timates evaluating the PH assumption for treat¬ 
ment (TX = 2 vs. TX = 1), and the plots start out 
parallel but then begin to converge over time. One 
interpretation is that the effect of the treatment 
weakens over time. For this interpretation, a PH 
model is not appropriate. Another interpretation 
is that the effect of the treatment remains con¬ 
stant over time but the plots converge due to un¬ 
observed heterogeneity in the population. For this 
interpretation, a PH model with frailty would be 
appropriate. 


Recall, from Section VI of this chapter, that a 
Weibull PH model is also an AFT model. The only 
difference is in the way the model is parameter¬ 
ized. We next present the AFT form of Model 2. 
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Unconditional survival function 
Su(t) with gamma frailty g(a) 

OO 



S(t | a)g(a)d a 


= [[i-einS(or 1/fl 


Before stating the model, we show the uncondi¬ 
tional survival function using gamma frailty. Re¬ 
call that the unconditional survival function is ob¬ 
tained by integrating over the frailty, as shown on 
the left. 


Model 3 (Weibull AFT with gamma 
frailty) 

Su(t) = [[l -0 In S(0r 1/9 

where S(t) = exp(—A t p ) (Weibull) 
and 

1 

= exp(oco + aiTX. 

+ CX 2 PERF + 0 C 3 DD 
+ Co, AGE + o 5 PRIORTX) 


Model 3 (the AFT form of Model 2) is presented in 
terms of the unconditional survival function S y (t). 
The unconditional survival function is a function 
of S(t), which represents the Weibull survival func¬ 
tion. The Weibull survival function, in turn, is pa¬ 
rameterized in terms of the shape parameter p 
and regression coefficients using AFT parameter¬ 
ization (see left). 


Model 3 Output 


Weibull regression (AFT form) 
Gamma frailty 

Log likelihood = —200.11338 


_t 

Coef. 

Std. Err. 

z 

P>|z| 

tx 

-.068 

.190 

-0.36 

0.721 

perf 

.040 

.005 

8.37 

0.000 

dd 

.004 

.009 

0.44 

0.661 

age 

.008 

.009 

0.89 

0.376 

priortx 

.004 

.023 

0.18 

0.860 

_cons 

1.460 

.752 

1.94 

0.052 

/ln_p 

.435 

.141 

3.09 

0.002 

/ln.the 

-.150 

.382 

-0.39 

0.695 

P 

1.54 

.217 



1/p 

.647 

.091 



theta 

.861 

.329 




Likelihood ratio test of theta = 0: 
chibar2(01)= 12.18 
Prob>=chibar2 = 0.000 


The output for Model 3, shown on the left, is 
similar to that obtained from Model 2. The 
estimates for theta and p are identical to those 
obtained from Model 2. The difference is that 
the regression coefficients obtained with Model 3 
use AFT parameterization (multiply by — p to get 
the PH coefficient estimates in Model 2). 

An estimated acceleration factor of 0.93 com¬ 
paring two individuals with the same level of 
frailty, for the effect of treatment (TX = 2 vs. 
TX = 1) and controlling for the other covariates, 
is obtained by exponentiating the estimated 
coefficient (—0.068) of the TX variable. 


y(TX = 2 vs. 1) = exp(—0.068) 
= 0.93 


Comparing individuals with same a 
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Interpreting y 

• Taking test treatment reduces 
individual’s median survival 
time by factor of 0.93 

• Suggests slightly harmful effect 

• is not significant (p = 0.721) 


Another interpretation for this estimate is that an 
individual taking the test treatment instead of the 
standard treatment reduces her median survival 
time (i.e., contracts her individual level survival 
function) by an estimated factor of 0.93. This esti¬ 
mate suggests a slight harmful effect from the test 
treatment compared to the standard treatment. 
However, the estimated coefficient for TX is not 
significant, with a p-value of 0.721. 


PH assumption 

Individual level PH ^ Population 
level PH 

AFT assumption 

Individual level AFT =4- Population 
level AFT 


A key difference between the PH and AFT formu¬ 
lations of this model is that if the AFT assump¬ 
tion holds at the individual level, then it will also 
hold at the population level using the gamma (or 
inverse-Gaussian) distributed frailty. 


Population level survival (with 
gamma frailty) 

Sui(t) = [[l-dln Sm~ 1/6 

S u2 (t) = [[l— 0lnS 2 (f)]- 1/9 


If Sj(t) = S 2 (yt) 
then 

Sui(t) = [[1 — 0lnS,(f)r 1/61 

= [[l-dlnS 2 (yt)r 1/e 

= S u2 (yt) 


To see this for gamma frailty, let Sui(t) and S u2 (t) 
be the unconditional survival functions represent¬ 
ing the standard and test treatments respectively, 
at the mean levels of the other covariates. 

Also let y represent the individual level accelera¬ 
tion factor for treatment; that is, Si(t) = S 2 (yt). 
Then Sui(t) = S u2 (yt) (see left). 


Thus, 

Individual level AFT 
=> Population level AFT 


Thus, for models with gamma frailty, if the AFT as¬ 
sumption holds at the individual level then it also 
holds at the population level. 
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Coefficient estimates from Model 3 

• Applies to individual or 
population 

• Interpretation of exp(c?i) = 

0.93 

o Median survival time for 
individual reduced by factor 
of 0.93 

o Median survival time 
reduced in population by 
factor of 0.93 


The coefficient estimates obtained from Model 3 
can therefore be used at the population level as 
well as the individual level. So another interpreta¬ 
tion for the estimated acceleration factor for treat¬ 
ment is that the test treatment reduces the median 
survival time in the population by an estimated 
factor of 0.93. 


Models 2 and 3: 

Same model, different 
parameterization 
Same estimates for 
S(t), Su(t), h(t), hu(t) 


Model 2 and Model 3 are the same model but 
use different parameterizations. The models pro¬ 
vide identical estimates for the hazard and survival 
functions. 


Models 2 and 3: Weibull with 
gamma frailty 

• Unimodal unconditional hazard 

Log-logistic model 

• Accommodates unimodal 
hazard without a frailty 
component 


Recall that the estimated unconditional haz¬ 
ard function obtained from this frailty model is 
of unimodal shape. Alternatively, a log-logistic 
(or lognormal) model, which accommodates a 
unimodal-shaped hazard function, could have 
been run without the frailty (see Practice Exercises 
8 to 11 for comparison). 


Parametric likelihood with frailty 

• Uses fu(t), where f y (t) = 
hu(t)Su(t) 

• Formulated similarly to that 
described in Section X with fj(t) 
replacing f(t) 

• Additional parameter 0 


The likelihood for Model 3 can be formulated us¬ 
ing the unconditional probability density function 
fu(t) which is the product of the unconditional 
hazard and survival functions. The likelihood is 
constructed in a similar manner to that described 
previously in this chapter except that fj(t) is used 
for the likelihood rather than f(t) (see Section X). 
The main difference is that there is one additional 
parameter to estimate, the variance of the frailty. 
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Shared Frailty 

• Clusters share same frailty 

• For example, subjects from 
same family may share 
unobserved factors 

o Shared frailty designed to 
account for such similarities 

Unshared Frailty 

• The type of frailty we have 
described previous to this point 

• Frailty distributed 
independently among subjects 

Shared Frailty Models 

• Similar to random effect 
regression models 

• Accounts for within-cluster 
correlation 

• 9 is a measure of the degree of 
correlation 

Hazard conditional on shared frailty 
(for jth subject in kth cluster) 

h jk (t| cx k ) = a k h jk (t) 
where 

h jk (t) = h(t|Xj k ) 
forj = 1,2, ...,n k 
and total n k subjects in k th cluster 

If family is the cluster variable, 
then 

subjects of same family have same 
a k 


Another type of frailty model is the shared frailty 
model. With this model, clusters of subjects are 
assumed to share the same frailty. For example, 
subjects from the same family may be similar with 
respect to some unobserved genetic or environ¬ 
mental factors. Allowing family members to share 
the same frailty is designed to account for such 
similarities. 

By contrast, the frailty described previous to this 
point (unshared frailty) has been assumed to be 
distributed independently among subjects. 


Adding shared frailty to a survival model plays an 
analogous role to that of adding a random effect to 
a linear regression as a way to account for correla¬ 
tion between clusters of observations (Kleinbaum 
and Klein 2002). The estimate for the variance pa¬ 
rameter 9 in a shared frailty model can be thought 
of as a measure of the degree of correlation, where 
9 = 0 indicates no within-cluster correlation. 

For a shared frailty model, the conditional hazard 
function for the jth subject from the kth cluster 
can be expressed as <x k multiplied by hj k (t) where 
hj k (t) depends on the subject’s covariates Xj k . No¬ 
tice that the frailty a k is subscripted by k, but not 
by j. This indicates that subjects from the same 
cluster share the same frailty. If, for example, sub¬ 
jects are clustered by family, then subjects from 
the same family are assumed to have the same 
frailty. 
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Shared and unshared frailty 

• Fundamentally the same 

o Accounts for variation due 
to unobservable factors 

• Difference in data to which they 
are applied 

o Affects interpretation and 
methods of estimation 

Unshared frailty models 

• Subjects assumed independent 

Shared frailty models 

• Accounts for dependence 
among subjects who share 
frailty 

Su(t) and hu(t) 

• Population averages in 
unshared frailty models 

• Population averages in shared 
frailty models provided that 
cluster size is uncorrelated with 
frailty 

Likelihood for shared frailty models 

• More complicated than for 
unshared frailty models 

• Unconditional contribution of 
each cluster formulated 
separately by integrating out 
g(a) 

• Full likelihood formed as 
product of unconditional 
contribution from each cluster 


The frailty in a shared frailty model or unshared 
frailty model is fundamentally the same, a ran¬ 
dom effect to account for a source of variation due 
to unobservable, or latent, factors. However, the 
data to which the shared and unshared frailty 
is applied are different, affecting differences 
in interpretation and methods of estimation. 


For unshared frailty models, a subjects survival is 
assumed to be independent of the survival of other 
subjects in the study population. For shared frailty 
models, however, the frailty accounts for depen¬ 
dence among subjects who share the same frailty. 
Shared frailty provides an approach to account for 
correlation in the data due to unobservable factors 
common within clusters of subjects. 


Recall, with unshared frailty models, we inter¬ 
preted the unconditional survival and hazard 
functions as representing population averages. 
With shared frailty models, however, these uncon¬ 
ditional functions may not strictly represent pop¬ 
ulation averages unless the number of subjects in 
a cluster is uncorrelated with the level of frailty. 


The formulation of the likelihood is more com¬ 
plicated for shared frailty models than it is for 
unshared frailty models. To construct the shared 
frailty likelihood, the unconditional contribution 
for each cluster of subjects is formulated sepa¬ 
rately by integrating out the frailty from the prod¬ 
uct of each subject's conditional contribution. The 
full likelihood is then formulated as the product of 
the contributions from each cluster (see Gutierrez 
2002 for details). 
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Shared frailty in Cox model 

• Provided by Stata 

o Only gamma distributed 
shared frailty available 

• Accounts for within-group 
correlation 

Cox shared frailty model 

hjk(t| cxj) = a k h 0 (t)exp((3X ik ) 

for j = l,2 ,...,n k 
total of n k subjects in kth cluster 


Up to this point we have discussed frailty in terms 
of parametric models. Stata (version 8) allows 
shared frailty to be included in a Cox model in or¬ 
der to account for within-group correlation. The 
conditional hazard function for the jth subject 
from the kth cluster can be expressed as cx k mul¬ 
tiplied by the baseline hazard ho(t) multiplied by 
exp((3Xj k ). The frailty component is assumed to 
follow some distribution even though the distribu¬ 
tion is unspecified for the rest of the model. Stata 
only allows a gamma distribution for the frailty to 
be included with a Cox model. 


PH violation of h y (t) in Cox model 

• If gamma-distributed frailty 
included 

• Interpreting coefficient 
estimates 

o Only used for HR estimates 
among those who share 
same a 


If a gamma-distributed frailty component is added 
to the Cox model, then the PH assumption is not 
satisfied for the unconditional hazards. In this 
framework, the frailty in a Cox model can be 
thought of as a source of random error that causes 
violation of the PH assumption at the population 
level. Consequently, care must be taken in the in¬ 
terpretation of the coefficient estimates. They can 
only be used to obtain estimates for hazard ratios 
conditioned on the same level of frailty. 


Recurrent events 

• Multiple events from same 
subject 

• Events from same subject may 
be correlated 

• Clusters are formed 
representing each subject 

o Different subjects do not 
share frailty 

o Observations from same 
subject share frailty 


Shared frailty models can also be applied to re¬ 
current event data. It is reasonable to expect that 
multiple events occurring over follow-up from the 
same individual would be correlated. To handle 
within-subject correlation, clusters are formed, 
each containing observations from the same sub¬ 
ject. In this setting, it is not the case that different 
subjects share the same frailty. Rather, multiple 
observations representing the same subject share 
the same frailty. 


Recurrent events: 

• Topic of next chapter 
(Chapter 8) 


Survival analyses on recurrent events are the fo¬ 
cus of the next chapter (Chapter 8) of this text. 
An example of a Weibull model with shared frailty 
applied to recurrent event data is presented in the 
next chapter. 
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XIII. Summary 

Parametric Models 

• Assume distribution for survival 
time 

• Distribution specified in terms 
of parameters 

• Parameters estimated from data 


In this chapter we presented parametric survival 
models as an alternative to the Cox model. They 
are called parametric models because the distri¬ 
bution of the time-to-event variable is specified 
in terms of unknown parameters, which are esti¬ 
mated from the data. Distributions that are com¬ 
monly utilized are the exponential, the Weibull, 
the log-logistic, the lognormal, and the general¬ 
ized gamma. 


f(t) specified =»• corresponding S(t), 
h(t) also determined 


Moreover, 

Specifying one of f(t), S(t), or h(t) 
determines all three functions 


More precisely, for parametric survival models, it 
is the probability density function f(t) of the dis¬ 
tribution that is specified in terms of the parame¬ 
ters. Once f(t) is specified, the corresponding sur¬ 
vival and hazard functions S(t) and h(t) can also be 
determined. Moreover, specifying any one of the 
probability density function, survival function, or 
hazard function allows the other two functions to 
be determined. 


Parametric models 

• Need not be PH models 

• Many are AFT models 


The proportional hazards (PH) assumption is the 
underlying assumption for a Cox PH model. How¬ 
ever, parametric survival models need not be pro¬ 
portional hazards models. Many parametric mod¬ 
els are acceleration failure time (AFT) models 
rather than proportional hazards models. 


Acceleration factor (y) 

• Key measure of association in 
AFT models 

• Describes stretching or 
contraction of S(t) 

AFT assumption 
S 2 (t) = Si(yt) 

t t 

Group 2 Group 1 


The acceleration factor (y) is the key measure 
of association obtained in an AFT model. It de¬ 
scribes the “stretching out” or contraction of sur¬ 
vival functions when comparing one group to an¬ 
other. If Si(t) and S 2 (t) are the survival func¬ 
tions for Group 1 and Group 2, respectively, then 
the AFT assumption can be expressed as S 2 (t) = 
Si(yt). 


We presented detailed examples of the exponen¬ 
tial, Weibull, and log-logistic model using the re¬ 
mission dataset. 

• Weibull model 

• Log-logistic model 


Detailed examples presented: 
• Exponential model 
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Exponential Model 

• h(t) = A (constant hazard) 

• Special case of Weibull model 

Weibull Model 

• AFT <^> PH 
Log-logistic Model 


The underlying assumption for an exponential 
model, a special case of the Weibull model, is that 
the hazard function is constant over time (i.e., 
h(t) = A). The Weibull model is unique in that if the 
PH assumption holds then the AFT assumption 
also holds (and vice versa). The log-logistic model 
does not accommodate the PH assumption. How¬ 
ever, if the AFT assumption holds in a log-logistic 
model, then the proportional odds (PO) assump¬ 
tion also holds (and vice versa). 


• Not a PH model 

• AFT <=> PO 


PO assumption 

_ S(f,x*)/[1 - S(t,x*)] 
S(t,x)/[l-S(t,x)] 

OR is constant over time 


The idea underlying the proportional odds as¬ 
sumption is that the survival (or failure) odds ra¬ 
tio comparing two specifications of covariates re¬ 
mains constant over time. 


Graphical Evaluation 

Weibull and Exponential 
• Plot ln[-ln S(t)] against ln(t) 


We presented graphical approaches for evaluating 
the appropriateness of the exponential, Weibull, 
and log-logistic model by plotting a function of 
the Kaplan-Meier survival estimates SU ) against 
the log of time and then checking for linearity. 


For evaluation of the exponential and Weibull 
assumptions, the ln| - In S(/)] is plotted against 
ln(t) and for evaluation of the log-logistic as¬ 
sumption the log odds of S(t) is plotted against 
ln(t). 

Presented other parametric models We briefly discussed other parametric models 

such as the generalized gamma, lognormal, and 

• Generalized gamma model Gompertz models and showed additional para- 

• Lognormal model metric approaches such as modeling ancillary 

• Gompertz model (shape) parameters as a function of predictor vari¬ 

ables. 


Log-logistic: 


Plot In 


s(t) 


Check for linearity 


J against ln(t). 
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Contributions to Likelihood 

If event at t, contributes f(t) 

If censored, integrate over f(t) 

h 

J f{t)dt : left-censored at ti 
o 

OO 

J f(t)dt: right-censored at ti 

ti 

ti 

/ f(t)dt: interval-censored 
from ti to t 2 

Full likelihood (L) 

N 

L=l\ L i j = 1, 2,..., N 

;=i 

where L ; is the contribution from 
jth subject 

Binary regression for interval- 
censored data 

• Follow-up divided into intervals 
o Allows for multiple 

observations per subject 

• Binary outcome variable 
defined 

o Indicates survival or failure 
over each interval 

Binary regression for discrete sur¬ 
vival analysis 

• Analogous to interval-censored 
data 

o Discrete outcome—subjects 
survive discrete units of time 
o Interval outcomes— 
subjects survive intervals 
of time 


The parametric likelihood was developed and in¬ 
cludes a discussion of left-, right-, and interval- 
censored data. If a subject has an event at time t, 
then that subjects contribution to the likelihood 
is f(t). On the other hand, if a subject is censored 
(i.e., exact time of event unknown), then the sub¬ 
ject's contribution to the likelihood is found by in¬ 
tegrating over f(t). The integration limits are de¬ 
termined by the time and type of censorship (see 
left). 


Assuming independence among subjects, the full 
likelihood can be formulated as a product of each 
subjects contribution. 


We showed how binary regression could be ap¬ 
plied to interval-censored data by defining a di¬ 
chotomous outcome variable indicating subjects' 
survival or failure over each interval of their 
follow-up. The data layout for this type of anal¬ 
ysis allows multiple observations per subject, rep¬ 
resenting intervals of survival prior to failure (or 
censorship). 


Binary regression can also be used for discrete sur¬ 
vival analysis in which the “time-to-event” variable 
is considered discrete rather than continuous. The 
data layout is similar to that for interval-censored 
data except subjects are conceptualized as surviv¬ 
ing discrete units of time rather than continuous 
intervals of time. 
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Frailty, <x 
h(t|<x) = cxh(t) 

/ 

multiplicative effect on h(t) 
mean = 1, variance = 6 

6 estimated from data 


We concluded with a discussion of frailty mod¬ 
els. The frailty a is a multiplicative random effect 
on the hazard designed to account for individual- 
level unobserved factors that add an extra layer of 
variability beyond what has already been specified 
in the model. The frailty is generally assumed to 
follow a distribution with mean equal to 1 and is 
typically parameterized in terms of the variance 0 
which is estimated from the data. 


Chapters 

1. Introduction to Survival 
Analysis 

2. Kaplan-Meier Curves and the 
Log-Rank Test 

3. The Cox Proportional Hazard 
Model 

4. Evaluating the Proportional 
Hazards Assumption 

5. The Stratified Cox Procedure 

6. Extension of the Cox 
Proprtional Hazards Model for 
Time-Dependent Covariates 

/7. Parametric Survival Models 


The presentation is now complete. The reader 
can review the detailed outline that follows and 
then answer the practice exercises and test. 

In the next chapter (8) entitled “Recurrent 
Event Survival Analysis,” we consider approaches 
for analyzing data in which individuals may have 
more than one event over the course of their 
follow-up. 


Next: 


8. Recurrent Event Survival 
Analysis 
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Detailed 

Outline 


I. Overview (pages 260-262) 

A. Parametric Survival Models 

i. Outcome assumed to follow specified 
distribution 

ii. Weibull, exponential (a special case of the 
Weibull), log-logistic, lognormal, and 
generalized gamma are supported with 
popular software (SAS and Stata) 

iii. Contrasts with Cox model in which baseline 
hazard and survival functions are not specified 

II. Probability Density Function in Relation to the 
Hazard and Survival Function (pages 262-263) 

A. If any one of the hazard h(t), survival s(t), or 
probability density f(t) functions is known then 
the other two functions can be determined. 

OO 

B. If f(t) is specified, then S(t) = J f ( u)du 

t 

C. If S(t) is specified, then 

hit) = (—d[S(t)]/dt)/S(t) and 

m = ( -dism/dt 

t 

D. If h(t) is specified, then Sit) = exp(— J h{u)du) 
and fit) = hit)Sit) 

III. Exponential Example (pages 263-265) 

A. Hazard is constant (i.e., not a function of time) in 
an exponential model 

i. Stronger assumption than the PH assumption 
that the HR is constant 

B. Exponential PH model (one predictor Xi) 

i. In terms of the hazard: h(t) = A where 
A = exp(|3 0 + PjXj) 

ii. Hazard ratio: HR (Xi = 1 vs. Xi = 0) = exp((3[) 

IV. Acceleration Failure Time Assumption 

(pages 266-268) 

A. Underlying assumptions 

i. AFT—effect of covariates is multiplicative with 
respect to survival time 

ii. PH—effect of covariates is multiplicative with 
respect to the hazard 
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B. The acceleration factor (y) is the key measure 

of association in an AFT 

i. Acceleration factor is a ratio of survival times 
corresponding to any fixed value of S(t), that 
is, t A /tB where A and B denote two individuals 
for which S(t A ) = Sftfi) 

ii. S 2 (t) = Si(yt), survival function for Group 1, 
Si(t) is stretched (or contracted) by a factor of 
y compared to survival function for Group 2, 
S 2 (t) 

C. AFT illustration 

i. Dogs are said to grow older 7 times faster 
than humans, SdOO = Sn(7t) 

V. Exponential Example Revisited (pages 268-272) 

A. Exponential AFT model (one predictor Xi) 

i. In terms of survival: S(t) = exp(—At) where 
A = exp[-(a 0 + ai Xi)] 

ii. In terms of time: 

t = [—ln(S(t)] x exp(a 0 + aiXi) 

iii. Acceleration factor (Xi = 1 vs. 

Xi = 0),y = exp(ai) 

B. An exponential PH model is an exponential AFT 

model (but uses different parameterization) 

i. (3j = — <Xj, where |3j and ctj are PH and AFT 
parameterization for the jth covariate 

ii. a > 1 for (Xi = 1 vs. Xi = 0) implies effect 
of Xi = 1 is beneficial to survival 

iii. HR > 1 for (Xj = 1 vs. Xi = 0) implies effect 
of Xi = 1 is harmful to survival 

C. Exponential model is a special case of a Weibull 

model 

i. Graphical approach for evaluating 
appropriateness of exponential model is 
described in the section on the Weibull 
example 

VI. Weibull Example (pages 272-277) 

A. PH form of the Weibull model (one predictor Xi) 

i. In terms of the hazard: h(t) = A pt p ~ l where 
A = exp(|3 0 + PjXj) 

ii. Hazard ratio: HR (Xj = 1 vs. 

X! = 0) = exp( (3 j) 
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iii. Weibull hazard is mono tonic with its 
direction determined by the value of the 
shape parameter p 

a. p > 1 hazard increases over time 

b. p = 1 constant hazard (exponential model) 

c. p < 1 hazard decreases over time 

A. Graphical approach for evaluating appropriateness 
of Weibull model 

i. Plot the log negative log of the Kaplan-Meier 
survival estimates against the log of time for 
each pattern of covariates 

a. If Weibull assumption is correct then plots 
should be straight lines of slope p 

b. If exponential assumption is correct then 
plots should be straight lines with slope 
equal to one (p — 1) 

c. If plots are parallel straight lines then 
Weibull PH and AFT assumptions are 
reasonable 

B. AFT form of the Weibull model (one predictor Xi) 

i. In terms of survival: 

S(t) = exp(—At p ) = exp[—(A 1/p f) p ] where 
A 1/p = exp[-(«o + aiXi)] 

ii. In terms of time: 

t = [—ln(S(t)] 1/p x exp(cco + aiXi) 

iii. Acceleration factor (Xi = 1 vs. Xi = 0), 

Y = exp(ai) 

C. A Weibull PH model is a Weibull AFT model (but 
uses different parameterization) 

i. Unique property of Weibull model 
(exponential is special case, p = 1) 

ii. [3j = — otj p where (3, and a, are PH and AFT 
parameterization, respectively, for the jth 
covariate 

VII. Log-Logistic Example (pages 277-282) 

A. Log-logistic hazard function: 

h(t) = Apt p_1 /(1 + Af p ). 

i. p < 1 hazard decreases over time 

ii. p > 1 hazard first increases and then 
decreases over time (unimodal) 
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B. 


C. 


D. 


E. 


Graphical approach for evaluating 
appropriateness of log-logistic model 
i. Plot the log of the survival odds (using KM 
estimates) against the log of time for each 
pattern of covariates 

a. If log-logistic assumption is correct then 
plots should be straight line of slope — p 

b. If plots are parallel straight lines then 
log-logistic proportional odds (PO) and AFT 
assumptions are reasonable 

Log-logistic AFT model (one predictor Xi): 


i. In terms of survival: 


S(t)= 1/(1 + AM) = 1/(1 + (A 1/p t) p ) 
where A 1/p = exp(—(«o + aiXi)) 


ii. In terms of time: 
- , M/p 

t — —1— — 1 

L - S(t) 1 


M/p 

x exp(a 0 + aiXi) 


iii. Acceleration factor (Xi = 1 vs. X| = 0), 

Y = exp(ai) 

Log-logistic proportional odds (PO) model 
(one predictor Xi) 

i. In terms of survival: S(t) = 1/(1 + AM) where 
A = exp((3 0 + (SjXj) 

ii. Odds of an event (failure odds) by time t: 

(1 - S(t))/S(t) = AM 

iii. Odds of surviving event (survival odds) 
beyond t: S(t)/(1 — S(t)) = 1/AM 


iv. Failure odds ratio: 


HR (Xi = 1 vs. X! = 0) = exp((3j) 
a. PO assumption is that the odds ratio is 
constant over time 

v. Survival odds ratio: 


HR (Xi = 1 vs. X! = 0) = exp(-Pj) 
a. Survival odds ratio is reciprocal of failure 
odds ratio 

A log-logistic AFT model is a log-logistic PO 
model (but uses different parameterization) 
i. Log-logistic model is not a proportional 
hazards (PH) model 


ii. (3j = — a,/? where |3j and a, are PO and AFT 
parameterization for the jth covariate 
a. Shape parameter with Stata is 
parameterized as gamma = 1 /p 
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VIII. A More General Form of the AFT Model 

(pages 282-284) 

A. General form with one predictor (Xi): ln(T) = 
(Xq + aiXi + e 

B. Include additional parameter, cr. ln(T) = 
cxo + «iXi + cr e 

C. Letn = 1/p =>ln(T) = cxo + aiXi + (l/p)e 

D. Additive in terms of ln(T) but multiplicative 
in terms of T: 

T = exp ^oco + ajXi + 


(?) 


= expfcxo + (X\ Xi] x exp( — 

\P 

E. Collapse ao into baseline term, 

let To = exp(oto)exp( —e): 

\P ) 

so T = exp(«i Xi) x T 0 


IX. Other Parametric Models (pages 284-286) 

A. Generalized gamma model 

i. Additional shape parameters give flexibility 
in distribution shape 

ii. Weibull and lognormal are special cases 

B. Lognormal model 

i. ln(T) follows a normal distribution 

ii. Accommodates AFT model 

C. Gompertz model 

i. PH model, not AFT model 

D. Modeling failure time as an additive model 
i. Additive model with one predictor: T = 

otQ + opTRT + e (no log link) 

D. Modeling ancillary parameters 

i. Typically shape parameter p is considered a 
fixed constant 

ii. Can reparameterize shape parameter in 
terms of predictor variables and regression 
coefficients 


X. The Parametric Likelihood (pages 286-289) 

A. Product of each subject contribution 
(assuming independence) 

B. Subject's contribution uses probability 
density function f(t) 

i. Subject contributes f(t) if event is observed at 
time t 
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ii. Integrate over f(t) if subject is censored 

a. Integrate from 0 to t if subject is 
left-censored at t 

b. Integrate from t to infinity if subject is 
right-censored at t 

c. Integrate over interval of censorship if 
subject is interval-censored 

XI. Interval-Censored Data (pages 289-294) 

A. Binary regression is alternative approach if 
data are interval-censored 

B. Binary outcome variable represents survival 
or failure over each subinterval of subject’s 
follow-up 

C. Specify a link function when using binary 
regression 

i. Logit link for logistic regression 

ii. Complementary log-log link is an alternative 
to logistic regression 

D. Discrete survival analysis 

i. Time-to-event variable is discrete 

ii. Binary regression can be applied in a similar 
manner to that of interval-censored data 

XII. Frailty Models (pages 294-308) 

A. The frailty a is an unobserved multiplicative 
effect on the hazard function 

i. Hazard, conditioned on the frailty, 
h(t| a) = ah(t) 

ii. Survival, conditioned on the frailty, 

S(t|a) = S(t)“ 

B. Frailty assumed to follow some distribution 
g(oc) of mean 1 and variance 0 

i. The variance 6 is a parameter estimated by 
the data 

ii. Gamma and inverse-Gaussian are 
distributions offered by Stata software 

C. Designed to account for unobserved 
individual-level factors that influence survival 
i. Distinction is made between the 

individual-level and population-level hazards. 
PH assumption may hold on individual level 
but not on population level 
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D. Shared frailty models allow individuals to 
share the same frailty 

i. Play similar role as adding a random effect to 
a linear regression 

ii. Can account for within-group correlation. 
XIII. Summary (pages 309-312) 


Practice 

Exercises 


Answer questions 1 to 5 as true or false (circle T or F) 

T F 1. The acceleration factor comparing exposed and 
unexposed subjects, (E = 1 vs. E = 0), is a ratio 
of their median survival times (time to S(t) = 0.5), 
or more generally the ratio of their times to any 
fixed value of S(t) = q. 

T F 2. Let So(t) be the survival function for unexposed 
subjects (E = 0) and let Si(t) be the survival func¬ 
tion for exposed subjects (E = 1). If So(t) = Si(3t) 
then the median survival time for the unexposed 
subjects is 3 times longer than the median survival 
time for the exposed subjects. 

T F 3. The Cox proportional hazards model is a paramet¬ 
ric model. 

T F 4. If the acceleration failure time (AFT) assumption 
holds in a Weibull model then the proportional haz¬ 
ards assumption also holds. 

T F 5. The hazard is assumed constant in a log-logistic 
model. 

Questions 6 and 7 make use of the output (copied below) pre¬ 
sented in Sections III and V containing an example of the ex¬ 
ponential model. This example used the remission data with 
treatment status (coded TRT = 1 for the experimental treat¬ 
ment and TRT = 0 for the placebo). The exponential survival 
and hazard functions are, respectively, S(t) = exp(—At) and 
h(t) = A where A = exp[— (oto + aiTRT)] for the AFT param¬ 
eterization and A = exp(|3 0 + (3, TRT) for the PH parameter¬ 
ization. The output for both the AFT and PH forms of the 
model are presented. 


Exponential regression Exponential regression log 

accelerated failure-time form relative-hazard form 

A = exp[-(oco + aiTRT)] A = exp(|3 0 + f^TRT) 


_t 

Coef. 

Std. Err. 

z 

p>|z| 

_t 

Coef. 

Std. Err. 

z 

p>|z| 

trt 

1.527 

.398 

3.83 

0.00 

trt 

-1.527 

.398 

3.83 

0.00 

_cons 

2.159 

.218 

9.90 

0.00 

_cons 

-2.159 

.218 

-9.90 

0.00 
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6. In this chapter it was shown in an exponential model 
that the time to event is t = [— log(S(t)] x (1/A) given 
a fixed value of S(t). Use the output from the AFT form 
of the model to estimate the median survival time (in 
weeks) for the treated group (TRT = 1) and the placebo 
group (TRT = 0). 

7. Use the output from the PH form of the model to estimate 
the median survival time for the treated group (TRT = 
1) and the placebo group (TRT = 0). Notice the answers 
from Questions 6 and 7 are the same, illustrating that 
the AFT and PH forms of the exponential model are just 
different parameterizations of the same model. 

Questions 8 to 11 refer to a log-logistic AFT model using the 
data from the Veteran’s Administration Lung Cancer Trial. 
The exposure of interest is treatment status TX (standard = 
1, test = 2). The control variables are performance status 
(PERF), disease duration (DD), AGE, and prior therapy 
(PRIORTX). These predictors are used in the section on 
frailty models. The outcome is time to death (in days). The 
output is shown below. 

Log-logistic regression—accelerated failure-time form 


Log likelihood 

= -200.196 

LR chi2(5) = 
Prob > chi2 

= 61.31 
= 0.0000 


_t 

Coef. 

Std. Err. 

z 

p>|z| 

tx 

-.054087 

.1863349 

-0.29 

0.772 

perf 

.0401825 

.0046188 

8.70 

0.000 

dd 

.0042271 

.0095831 

0.44 

0.659 

age 

.0086776 

.0092693 

0.94 

0.349 

priortx 

.0032806 

.0225789 

0.15 

0.884 

_cons 

1.347464 

.6964462 

1.93 

0.053 

/In gam 

-.4831864 

.0743015 

-6.50 

0.000 

gamma 

.6168149 

.0458303 




8. State the AFT log-logistic model in terms of S(t) (note 
gamma = 1/p). 

9. Estimate the acceleration factor y with a 95% confidence 
interval comparing the test and standard treatment 
(TX = 2 vs. TX =1). Interpret your answer. 
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10. The AFT log-logistic model is also a proportional odds 
model. Use the output to estimate the odds ratio (odds 
of death) comparing the test and standard treatment. 
Also estimate the survival odds ratio comparing the test 
and standard treatment. 

11. The Akaike Information Criterion (AIC) is a method de¬ 
signed to compare the fit of different models. For this 
question, three models are compared using the same 5 
predictors: 

1. A Weibull model without frailty (presented as Model 
1 in the section on frailty models); 

2. A Weibull model containing a frailty component (pre¬ 
sented as Model 2 in the section on frailty models); 
and 

3. The log-logistic model presented above. 

Below is a table containing the log likelihood statistic for 
each model. 


Model 

Frailty 

Number of 
parameters 

Log likelihood 

1. Weibull 

No 

7 

-206.204 

2. Weibull 

Yes 

8 

-200.193 

3. Log-logistic 

No 

7 

-200.196 


The goal for this question is to calculate the AIC statistic 
for each model and select the model based on this criterion. 
The AIC statistic is calculated as: —2 log likelihood + 
2p (where p is the number of parameters in the model). A 
smaller AIC statistic suggests a better fit. The addition of 
2 times p can be thought of as a penalty if nonpredictive 
parameters are added to the model. Each model contains 
the 5 predictors, an intercept, and a shape parameter. 
Model 2 contains an additional variance parameter (theta) 
because a frailty component is included in the model. 
The log likelihood was unchanged when a frailty compo¬ 
nent was added to the log-logistic model (not shown in table). 

Note that if we are just comparing Models 1 and 2 we 
could use the likelihood ratio test because Model 1 is nested 
(contained) in Model 2. The likelihood ratio test is considered 
a superior method to the AIC for comparing models but 
cannot be used to compare the log-logistic model to the 
other two, because that model uses a different distribution. 

Which of the three models should be selected based on 
the AIC? 
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Questions 12 to 14 refer to a generalized gamma model us¬ 
ing the Veterans' data with the same five predictor variables 
that were used in the model for Questions 8 to 10. The gen¬ 
eralized gamma distribution contains two shape parameters 
(kappa and sigma) that allow great flexibility in the shape of 
the hazard. If kappa = 1, the model reduces to a Weibull dis¬ 
tribution with p = 1/sigma. If kappa = 0 the model reduces 
to a lognormal distribution. The output is shown below. 

Gamma regression—accelerated failure-time form 


Log likelihood 

= -200.626 

LR chi2(5) = 
Prob > chi2 

= 52.86 
= 0.0000 


t 

Coef. 

Std. Err. 

z 

P >|z| 

tx 

-.131 

.1908 

-0.69 

0.491 

perf 

.039 

.0051 

7.77 

0.000 

dd 

.0004 

.0097 

0.04 

0.965 

age 

.008 

.0095 

0.89 

0.376 

priortx 

.004 

.0229 

0.17 

0.864 

_cons 

1.665 

.7725 

2.16 

0.031 

/ln_sig 

.0859 

.0654 

1.31 

0.189 

/kappa 

.2376 

.2193 

1.08 

0.279 

sigma 

1.0898 

.0714 




12. Estimate the acceleration factor y with a 95% confidence 
interval comparing the test and standard treatment (TX 
= 2 vs. TX =1). 

13. Use the output to test the null hypothesis that a lognor¬ 
mal distribution is appropriate for this model. 

14. A lognormal model was run with the same five predic¬ 
tors (output not shown) and yielded very similar param¬ 
eter estimates to those obtained from the generalized 
gamma model shown above. The value of the log likeli¬ 
hood for the lognormal model was —201.210. Compare 
the AIC of the generalized gamma model, the lognormal 
model, and the log-logistic model from Question 11 and 
select a model based on that criterion. Note: each model 
contains an intercept and five predictors. The general¬ 
ized gamma distribution contains two additional shape 
parameters and the log-logistic and lognormal distribu¬ 
tions each contain one additional shape parameter (see 
Question 11 for further details on the AIC). 
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Questions 15 to 17 refer to a Weibull model using the 
remission data with treatment as the only predictor (coded 
TRT = 1 for the test treatment and TRT = 0 for the placebo). 
In this model both A and p are modeled as functions of 
the predictor TRT. The model can be stated in terms of the 
hazard function: h(t) = A pt p ~ { where A = exp(|3 0 + (3,TRT) 
and p = exp(<5o + <5iTRT). Typically, the shape parameter in 
a Weibull model is assumed constant (i.e., &\ = 0) across 
levels of covariates. This model is discussed in the section of 
this chapter called “Other Parametric Models.” The output 
obtained using Stata is shown below. 


Weibull regression—log relative-hazard form 

LR chi2(l) = 1.69 

Log likelihood = —47.063396 Prob > chi2 = 0.1941 


_t 

Coef. 

Std. Err. 

z 

p>|z| 

t 

trt 

-1.682 

1.374 

-1.22 

0.221 

_cons 

-3.083 

.646 

-4.77 

0.000 

ln_p 

trt 

-.012 

.328 

-0.04 

0.970 

_cons 

.315 

.174 

1.82 

0.069 


15. Even though A is parameterized similarly to that in a PH 
Weibull model, this model is not a PH model because 
the shape parameter p varies across treatment groups. 
Show the PH assumption is violated in this model by 
estimating the hazard ratios for TRT = 0 vs. TRT = 1 
after 10 weeks and after 20 weeks of follow-up. 

16. Perform a statistical test on the hypothesis 6j = 0 (the 
coefficient for the treatment term for ln(p)). Note: if we 
assume 5i = 0, then the model reduces to the example 
of the Weibull PH model presented in Section VI of this 
chapter. 

17. Consider the plot of the log negative log of the Kaplan- 
Meier survival estimates against the log of time for 
TRT = 1 and TRT = 0. How should the graph look if 
Si =0? 
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Test 


Answer the following true or false questions (circle T 
or F). 

T F 1. The accelerated failure time model and propor¬ 
tional hazards model are both additive models. 

T F 2. If the survival function is known then the hazard 
function can be ascertained (and vice versa). 

T F 3. If survival time follows a Weibull distribution 
then a plot of the ln[-ln S(t)] against ln(t) should 
be a straight line. 

T F 4. If the acceleration failure time (AFT) assumption 
holds in a log-logistic model then the proportional 
hazards assumption also holds. 

T F 5. If the acceleration factor for the effect of an expo¬ 
sure (exposed vs. unexposed) is greater than one, 
then the exposure is harmful to survival. 

T F 6 . Let Soft) be the survival function for unexposed 
subjects (E = 0) and let Sift) be the survival func¬ 
tion for exposed subjects (E = f). If y is the ac¬ 
celeration factor comparing E = f vs. E = 0 then 
Soft) = Si(yt). 

T F 7. Frailty models are designed to provide an ap¬ 
proach to account for unobserved individual-level 
characteristics. 

T F 8 . If you include a gamma distributed frailty compo¬ 
nent to the model, then you will see an additional 
parameter estimate for the variance of the frailty 
in the model output. 

T F 9. If survival time T follows a Weibull distribution, 
then ln(T) also follows a Weibull distribution. 

T F 10. If a subject is lost to follow-up after five years, 
then the subject is left-censored. 

Questions 11 to 17 refer to a Weibull model run with the “ad¬ 
dicts” dataset. The predictor of interest is CLINIC (coded 1 or 
2) for two methadone clinics for heroin addicts. Covariates 
include DOSE (continuous) for methadone dose (mg/day), 
PRISON (coded 1 if patient has a prison record and 0 if 
not), and a prison-dose product term (called PRISDOSE). 
The outcome is time (in days) until the person dropped out 
of the clinic or was censored. The Weibull survival and haz¬ 
ard functions are, respectively, S(t) = exp(—At p ) and h(t) = 
A pt p ~ l where A 1/p = exp[—(ao + op CLINIC + <x 2 PRISON + 
CX 3 DOSE + 0 C 4 PRISDOSE)] for the AFT parameterization 
and A = exp[|3 0 + pjCLINIC + |3 2 PRISON + (3 3 DOSE + 
|3 4 PRISDOSE] for the PH parameterization. The Stata out¬ 
put for both the AFT and PH forms of the model are presented 
as follows: 
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Weibull regression 
accelerated failure-time form 


Log likelihood = —260.74854 


_t 

Coef. 

Std. Err. 

z 

p>|z| 

clinic 

.698 

.158 

4.42 

0.000 

prison 

.145 

.558 

0.26 

0.795 

dose 

.027 

.006 

4.60 

0.000 

prisdose 

-.006 

.009 

-0.69 

0.492 

_cons 

3.977 

.376 

10.58 

0.000 

/In p 

.315 

.068 

4.67 

0.000 

P 

1.370467 




1 /p 

.729678 





Weibull regression 
log relative-hazard form 


Log likelihood = —260.74854 


_t 

Coef. 

Std. Err. 

z 

p>|z| 

clinic 

-.957 

.213 

-4.49 

0.000 

prison 

-.198 

.765 

-0.26 

0.795 

dose 

-.037 

.008 

-4.63 

0.000 

prisdose 

.009 

.013 

0.69 

0.491 

_cons 

-5.450 

.702 

-7.76 

0.000 

/Imp 

.315 

.068 

4.67 

0.000 

P 

1.370467 




1 /p 

.729678 





11. Estimate the acceleration factor with a 95% confidence 
interval comparing CLINIC = 2 vs. CLINIC = 1. Interpret 
this result. 

12. Estimate the hazard ratio with a 95% confidence interval 
comparing CLINIC = 2 vs. CLINIC = 1. Interpret this 
result. 

13. Estimate the coefficient for CLINIC in the PH Weibull 
model using the results reported in the output from 
the AFT form of the model. Hint: the coefficients for a 
Weibull PH and AFT model are related |3 ; = —otjp for 
the jth covariate. 

14. Is the product term PRISDOSE included in the model to 
account for potential interaction or potential confound¬ 
ing of the effect of CLINIC on survival? 

15. Use the output to estimate the median survival time for 
a patient from CLINIC = 2 who has a prison record and 
receives a methadone dose of 50 mg/day. Hint: use the re¬ 
lationship that t = [-In S(t)] 1/p x (l/A 1/p ) for a Weibull 
model. 

16. Use the output to estimate the median survival time for 
a patient from CLINIC = 1 who has a prison record and 
receives a methadone dose of 50 mg/day. 

17. What is the ratio of your answers from Questions 15 
and 16 and how does this ratio relate to the acceleration 
factor? 
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Questions 18 and 19 refer to the Weibull model (in AFT form) 
that was used for the previous set of questions (Questions 
11 to 17). The only difference is that a frailty component is 
now included in the model. A gamma distribution of mean 
1 and variance theta is assumed for the frailty. The output 
shown on in the following contains one additional parameter 
estimate (for theta). 

Weibull regression 
accelerated failure-time form 
Gamma frailty 


Log likelihood = —260.74854 


_t 

Coef. 

Std. Err. 

z 

p>|z| 

clinic 

.698 

.158 

4.42 

0.000 

prison 

.145 

.558 

0.26 

0.795 

dose 

.027 

.006 

4.60 

0.000 

prisdose 

-.006 

.009 

-0.69 

0.492 

_cons 

3.977 

.376 

10.58 

0.000 

/ln p 

.315 

.068 

4.67 

0.000 

P 

1.370467 




1 /p 

.729678 




theta 

.00000002 


.0000262 



Likelihood ratio test of theta=0: 
chibar2(01) = 0.00 
Prob>=chibar2 = 1.000 


18. Did the addition of the frailty component change any 
of the other parameter estimates (besides theta)? Did it 
change the log likelihood? 

19. A likelihood ratio test for the hypothesis H 0 : theta = 0 
yields a p-value of 1.0 (bottom of the output). The pa¬ 
rameter estimate for theta is essentially zero. What does 
it mean if theta = 0? 
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Answers to 

Practice 

Exercises 


1. T 

2. F: The median survival time for the unexposed is 1/3 of the 
median survival time for the exposed. 

3. F: The Cox model is a semiparametric model. The distri¬ 
bution of survival time is unspecified in a Cox model. 

4. T 

5. F: The hazard is assumed constant in an exponential 
model. 

6 . t = [—log(S(t)] x (1/A), where S(t) = 0.5, and 1/A = 
exp(oco + ajTRT). 

For TRT = 0: estimated median survival 
= [—ln(0.5)] exp(2.159) = 6.0 weeks. 

For TRT = 1: estimated median survival 
= [—ln(0.5)] exp(2.159 + 1.527) = 27.6 weeks. 

7. t = [—log(S(t)] (1/A), where S(t) = 0.5, and 

A = exp((3 0 + (3! TRT) => 1/A = exp[-(|3 0 + PjTRT)]. 

For TRT = 0: estimated median survival 
= [—ln(0.5)] exp[—(—2.159)] = 6.0 weeks. 

For TRT = 1: estimated median survival 
= [—ln(0.5)] exp[—(—2.159 — 1.527)] = 27.6 weeks. 

8 . S(t) = 1/(1 + A t p ) where A 1/F = exp[—(a 0 + ociTX 

+ (X2PERF + CC3DD + 0C4AGE + 0C5PRIORTX)]. 

expfao + <xi(2)+ a -2 PERF + a^DD + 0 C 4 AG-E + a^PRIORTX] 
^ expfao + oci (1) + K 2 PERF + txjDD + 04 AGE + ocsPRlORTX] 

= exp(oci) 

Y = exp(—0.054087) = 0.95 

95% Cl = exp[—0.054087 ± 1.96(0.1863349)] = (0.66, 1.36) 

The point estimate along with the 95% Cl suggests a null 
result. 

10. The coefficients for a log-logistic proportional odds (PO) 
and AFT model are related (3! = — oci p = — (3 j -t- gamma, 
where |3! is the coefficient for TX in a PO model. 

OR = exp(—ai-f- gamma) 

estimated OR = exp(-0.054087 -t- 0.6168149) = 0.92 
estimated survival OR = l/[exp(—0.054087 -f- 0.6168149)] 
= 1.09. 
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11. The AIC statistic is calculated as —2 log likelihood +2p 
(where p is the number of parameters in the model). A 
smaller AIC statistic suggests a better fit. The AIC statistic 
is shown below for each of the three models. 


Model 

Frailty 

Number of 
parameters 

Log 

likelihood 

AIC 

1. Weibull 

No 

7 

-206.204 

426.408 

2. Weibull 

Yes 

8 

-200.193 

416.386 

3. Log-logistic 

No 

7 

-200.196 

414.392 


Based on the AIC, the log-logistic model is selected 
yielding the smallest AIC statistic at 414.392. 


exp[oco + oci(2)+ a^PERF + 1 x 3 DD + 014 AGE + as PRIORTX] 
^ expfao + oci (1) + otiPERF + 013 D D + 014 AGE + asPRIORTX] 

= exp(ai) 

Y = exp(—0.131) = 0.88 

95% Cl = exp[(—0.131 ± 1.96(0.1908)] = (0.60, 1.28) 

13. The generalized gamma distribution reduces to a lognor¬ 
mal distribution if kappa = 0. 

H 0 : kappa = 0 

0 2376 

Wald test statistic: z = ^ = 1.08 (from output) 

p-value: 0.279 (from output) 

Conclusion: p-value not significant at a significance level 
of 0.05. Not enough evidence to reject H 0 . The lognormal 
distribution may be appropriate. 

14. The AIC statistic is shown below for the generalized 
gamma, lognormal, and log-logistic models. 


Model 

Number of 
parameters 

Log 

likelihood 

AIC 

Generalized Gamma 

8 

-200.626 

417.252 

Lognormal 

7 

- 201.210 

416.420 

Log-logistic 

7 

200.196 

414.392 


As in Question 11, the log-logistic model is selected yielding 
the smallest AIC at 414.392. 
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15. 


h(t) = A pt p ~ l where A = exp((3 0 + PjTRT) and 
p = exp( 6 0 + 61 TRT) 

let A 0 = exp[(3 0 + |3i(0)], Ai = exp[|3 0 + P[(l)] 
let p 0 = exp[ 6 0 + &i( 0 )], p 1 = exp[ 6 0 + 6i(l)] 
A 0 = 0.0458, Aj = 0.0085, j9 0 = 1-3703, 


pi = 1.3539 (calculated using output) 

Mpot Po ~ l 


HR (TRT = 0 vs. TRT = 1) = 
HR {as a function of t) = 


Ai p\tP'~ l 

(0.0458)( 1.3703)?° 3703 


(0.0085)( 1.3539)?° 3539 


HR( t = 10) = 


HR( t = 20) = 


(0.0458X1.3703)(10 0 - 3703 ) 
(0.0085X1.3539)(10°- 3539 ) 
(0.0458)(1.3703)(20°- 3703 ) 
(0.0085X1.3539)(20°- 3539 ) 


= 5.66 


= 5.73 


The estimated hazard ratios for RX at 10 weeks and at 20 
weeks are different, demonstrating that the hazards are not 
constrained to be proportional in this model. However, the 
estimated hazard ratios are just slightly different, suggesting 
that the PH assumption is probably reasonable. 


16. H 0 : 8 i = 0 

„ rlJ . . -0.0123083 

Wald test statistic: z = A —— = — 0.04 (from output) 

0.328174 F 

p-value: 0.970 (from output) 

Conclusion: p-value is not significant. No evidence to reject 
H 0 . The PH assumption is reasonable. 


17. If the Weibull assumption is met, then the plots should be 
straight lines with slope p. If §! = 0, then the slope p is the 
same for TRT = 1 and TRT = 0 and the lines are parallel. 
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Introduction 


Abbreviated 

Outline 


This chapter considers outcome events that may occur more 
than once over the follow-up time for a given subject. Such 
events are called “recurrent events.” Modeling this type of 
data can be carried out using a Cox PH model with the data 
layout constructed so that each subject has a line of data 
corresponding to each recurrent event. A variation of this 
approach uses a stratified Cox PH model, which stratifies 
on the order in which recurrent events occur. Regardless of 
which approach is used, the researcher should consider ad¬ 
justing the variances of estimated model coefficients for the 
likely correlation among recurrent events on the same sub¬ 
ject. Such adjusted variance estimates are called “robust vari¬ 
ance estimates.” A parametric approach for analyzing recur¬ 
rent event data that includes a frailty component (introduced 
in Chapter 7) is also described and illustrated. 


The outline below gives the user a preview of the material to 
be covered by the presentation. A detailed outline for review 
purposes follows the presentation. 

I. Overview (page 334) 

II. Examples of Recurrent Event Data 
(pages 334-336) 

III. Counting Process Example (pages 336-337) 

IV. General Data Layout for Counting Process 
Approach (pages 338-339) 

V. The Counting Process Model and Method 
(pages 340-344) 

VI. Robust Estimation (pages 344-346) 

VII. Results for CP Example (pages 346-347) 

VIII. Other Approaches—Stratified Cox 
(pages 347-353) 

IX. Bladder Cancer Study Example (Continued) 
(pages 353-357) 

X. A Parametric Approach Using Shared Frailty 
(pages 357-359) 

XI. A Second Example (pages 359-364) 

XII. Survival Curves with Recurrent Events 
(pages 364-367) 

XIII. Summary (pages 367-370) 
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Objectives 


Upon completing this chapter, the learner should be able to: 

1. State or recognize examples of recurrent event data. 

2. State or recognize the form of the data layout used for the 
counting process approach for analyzing correlated data. 

3. Given recurrent event data, outline the steps needed to an¬ 
alyze such data using the counting process approach. 

4. State or recognize the form of the data layout used for the 
marginal model approach for analyzing correlated data. 

5. Given recurrent event data, outline the steps needed to an¬ 
alyze such data using the marginal model approach. 
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Presentation 


I. Overview 


Outcome occurs more 
than once per subject: 
RECURRENT 
EVENTS 

(Counting Process and 
Other Approaches) 


In this chapter we consider outcome events that 
may occur more than once over the follow-up time 
for a given subject. Such events are called “recur¬ 
rent events.” We focus on the Counting Process 
(CP) approach for analysis of such data that uses 
the Cox PH model, but we also describe alterna¬ 
tive approaches that use a Stratified Cox (SC) PH 
model and a frailty model. 


II. Examples of Recurrent 
Event Data 

1. Multiple relapses from 
remission—leukemia patients 

2. Repeated heart attacks— 
coronary patients 

3. Recurrence of tumors—bladder 
cancer patients 

4. Deteriorating episodes of visual 
acuity—macular degeneration 
patients 


Up to this point, we have assumed that the event 
of interest can occur only once for a given subject. 
However, in many research scenarios in which the 
event of interest is not death, a subject may expe¬ 
rience an event several times over follow-up. Ex¬ 
amples of recurrent event data include: 

1. Multiple episodes of relapses from remission 
comparing different treatments for leukemia 
patients. 

2. Recurrent heart attacks of coronary patients be¬ 
ing treated for heart disease. 

3. Recurrence of bladder cancer tumors in a co¬ 
hort of patients randomized to one of two treat¬ 
ment groups. 

4. Multiple events of deteriorating visual acuity 
in patients with baseline macular degenera¬ 
tion, where each recurrent event is considered 
a more clinically advanced stage of a previous 
event. 


Objective 

Assess relationship of predictors to For each of the above examples, the event of inter¬ 
rate of occurrence, allowing for mul- est differs, but nevertheless may occur more than 
tiple events per subject once per subject. A logical objective for such data 

is to assess the relationship of relevant predictors 
to the rate in which events are occurring, allowing 
for multiple events per subject. 
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LEUKEMIA EXAMPLE 


Do treatment groups differ in rates of 
relapse from remission? 


In the leukemia example above, we might ask 
whether persons in one treatment group are ex¬ 
periencing relapse episodes at a higher rate than 
persons in a different treatment group. 


HEART ATTACK EXAMPLE 


Do smokers have a higher heart attack rate 
than nonsmokers? 


If the recurrent event is a heart attack, we might 
ask, for example, whether smokers are experienc¬ 
ing heart attack episodes at a higher rate than non- 
smokers. 


LEUKEMIA AND HEART ATTACK 
EXAMPLES 


All events are of the same type 
The order of events is not important 


Heart attacks: Treat as identical events; 

Don't distinguish among 
1st, 2nd, 3rd, etc. attack 


For either of the above two examples, we are treat¬ 
ing all events as if they were the same type. That is, 
the occurrence of an event on a given subject iden¬ 
tifies the same disease without considering more 
specific qualifiers such as severity or stage of dis¬ 
ease. We also are not taking into account the order 
in which the events occurred. 

For example, we may wish to treat all heart at¬ 
tacks, whether on the same or different subjects, 
as identical types of events, and we don’t wish to 
identify whether a given heart attack episode was 
the first, or the second, or the third event that oc¬ 
curred on a given subject. 


BLADDER CANCER EXAMPLE 


Compare overall tumor recurrence rate 
without considering order or type of tumor 


The third example, which considers recurrence of 
bladder cancer tumors, can be considered simi¬ 
larly. That is, we may be interested in assessing 
the “overall” tumor recurrence rate without dis¬ 
tinguishing either the order or type of tumor. 


MACULAR DEGENERATION OF 
VISUAL ACUITY EXAMPLE 


A second or higher event is more severe 
than its preceding event 


Order of event is important 


The fourth example, dealing with macular degen¬ 
eration events, however, differs from the other ex¬ 
amples. The recurrent events on the same subject 
differ in that a second or higher event indicates a 
more severe degenerative condition than its pre¬ 
ceding event. 

Consequently, the investigator in this scenario 
may wish to do separate analyses for each ordered 
event in addition to or instead of treating all recur¬ 
rent events as identical. 
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Use a different analysis depending 
on whether 

a. recurrent events are treated as 
identical 

b. recurrent events involve different 
disease categories and/or the 
order of events is important 


We have thus made an important distinction to 
be considered in the analysis of recurrent event 
data. If all recurrent events on the same subject 
are treated as identical, then the analysis required 
of such data is different than what is required if 
either recurrent events involve different disease 
categories and/or the order that events reoccur is 
considered important. 


Recurrent events identical 

41- 

Counting Process Approach 

(Andersen etal., 1993) 


The approach to analysis typically used when re¬ 
current events are treated as identical is called 

the Counting Process Approach (Andersen et al., 
1993). 


Recurrent events: different disease 
categories or event order important 

Stratified Cox (SC) Model 
Approaches 


When recurrent events involve different disease 
categories and/or the order of events is considered 
important, a number of alternative approaches to 
analysis have been proposed that involve using 
stratified Cox (SC) models. 


In this chapter, we focus on the Counting Process 
(CP) approach, but also describe the other strati¬ 
fied Cox approaches (in a later section). 


III. Counting Process Example 


Table 8.1. 2 Hypothetical Subjects 
Bladder Cancer Tumor Events 



Time 

interval 

Event 

indicator 

Treatment 

group 

Al 

0 to 3 

1 

1 


3 to 9 

1 

1 


9 to 21 

1 

1 


21 to 23 

0 

1 

Hal 

0 to 3 

1 

0 


3 to 15 

1 

0 


15 to 25 

1 

0 


To illustrate the counting process approach, 
we consider data on two hypothetical subjects 
(Table 8.1), Al and Hal, from a randomized trial 
that compares two treatments for bladder cancer 
tumors. 

Al gets recurrent bladder cancer tumors at months 
3,9, and 21, and is without a bladder cancer tumor 
at month 23, after which he is no longer followed. 
Al received the treatment coded as 1. 

Hal gets recurrent bladder cancer tumors at 
months 3, 15, and 25, after which he is no longer 
followed. Hal received the treatment coded as 0. 
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Al 

Hal 

No. recurrent 

events 

3 

3 

Follow-up 

time 

23 months 

25 months 

Event times 
from start 
of follow-up 

3,9,21 

3, 15,25 

Additional 
months of 
follow-up 
after last 

event 

2 months 

0 months 


Table 8.2. Example of Data Layout 
for Counting Process Approach 


Subj 

Interval 

Number 

Time 

Start 

Time 

Stop 

Event 

Status 

Treatment 

Group 

Al 

i 

0 

3 

i 

i 

Al 

2 

3 

9 

i 

i 

Al 

3 

9 

21 

i 

i 

Al 

4 

21 

23 

0 

i 

Hal 

1 

0 

3 

1 

0 

Hal 

2 

3 

15 

1 

0 

Hal 

3 

15 

25 

1 

0 


Counting process: Start and Stop 
times 

Standard layout: only Stop 
(survival) times (no recurrent 
events) 



Interval 

Time 

Time 

Event 

Treatment 

Subj 

Number 

Start 

Stop 

Status 

Group 

Sal 

i 

0 

17 

i 

0 

Mai 

i 

0 

12 

0 

1 


Al has experienced 3 events of the same type 
(i.e., recurrent bladder tumors) over a follow-up 
period of 23 months. Hal has also experienced 
3 events of the same type over a follow-up period 
of 25 months. 

The 3 events experienced by Al occurred at differ¬ 
ent survival times (from the start of initial follow¬ 
up) from the 3 events experienced by Hal. 

Also, Al had an additional 2 months of follow-up 
after his last recurrent event during which time 
no additional event occurred. In contrast, Hal had 
no additional event-free follow-up time after his 
last recurrent event. 

In Table 8.2, we show for these 2 subjects, how 
the data would be set up for computer analyses 
using the counting process approach. Each sub¬ 
ject contributes a line of data for each time inter¬ 
val corresponding to each recurrent event and any 
additional event-free follow-up interval. 


A distinguishing feature of the data layout for the 
counting process approach is that each line of data 
for a given subject lists the start time and stop 
time for each interval of follow-up. This contrasts 
with the standard layout for data with no recurrent 
events, which lists only the stop (survival) time. 

Note that if a third subject, Sal, failed without fur¬ 
ther events or follow-up occurring, then Sal con¬ 
tributes only one line of data, as shown at the left. 
Similarly, only one line of data is contributed by a 
(fourth) subject, Mai, who was censored without 
having failed at any time during follow-up. 
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IV. General Data Layout: 
Counting Process 
Approach 

N subjects 

r; time intervals for subject i 

6 ij event status (0 or 1) for subject 
i in interval j 

tjjo start time for subject i in 
intervalj 

tiji stop time for subject i in 
intervalj 

Xjjk value of kth predictor for 
subject i in interval j 

i = 1, 2,... ,N; j = 1,2,... ,np 
k = 1,2.p 


Table 8.3. General Data Layout: 
CP Approach 
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r 

0 



t 

i 

s 

t 

p 



i 

j 


tijo 

% 

Xiji • 

• • X ijp 

i 

i 

£>n 

tlio 

till 

Xm . 

.. Xll„ 

i 

2 

612 

tl20 

tl21 

X121 ■ 

■ ■ X 12p 

i 

ri 


tlr,0 

tlrj 1 

Xlr,l . 

■ ■ Xl r ,p 


i 1 

6il 

tilO 

till 

Xm . 

■ Xil p 

i 2 

&i2 

ti20 

ti21 

Xi21 ■ 

■ X l2p 

i n 

Sir, 

tir,0 

tir,l 

X ir ,l . 

■ Xj r ,p 


N 1 5ui tNio tun Xnii ... Xfjip 

N 2 6 N 2 t N 20 t N 2i X N 21 . . . X N 2p 


The general data layout for the counting process 
approach is presented in Table 8.3 for a dataset 
involving N subjects. 

The ith subject has r, recurrent events. 5y denotes 
the event status (1 = failure, 0 = censored) for the 
ith subject in the jth time interval, tyo and tyi 
denote the start and stop times, respectively, for 
the ith subject in the jth interval. Xyk denotes the 
value of the kth predictor for the the ith subject in 
the jth interval. 


Subjects are not restricted to have the same num¬ 
ber of time intervals (e.g., r, does not have to equal 
r 2 ) or the same number of recurrent events. If the 
last time interval for a given subject ends in censor¬ 
ship (6jj = 0), then the number of recurrent events 
for this subject is ri — 1; previous time intervals, 
however, usually end with a failure (6y = 1). 

Also, start and stop times may be different for dif¬ 
ferent subjects. (See the previous section's exam¬ 
ple involving two subjects.) 

As with any survival data, the covariates (i.e., Xs) 
may be time-independent or time-dependent for 
a given subject. For example, if one of the Xs is 
“gender” (1 = female, 0 = male), the values of this 
variable will be all Is or all Os over all time in¬ 
tervals observed for a given subject. If another X 
variable is, say, a measure of daily stress level, the 
values of this variable are likely to vary over the 
time intervals for a given subject. 

The second column (“Interval j”) in the data layout 
is not needed for the CP analysis, but is required 
for other approaches described later. 


N t\ S\ rfs - tfj rN o t Nr[J i ^Nr\i I ■ ■ ■ '^Ni\p 
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Table 8.4. First 26 Subjects: 
Bladder Cancer Study 

id int event start stop tx num size 


1 1 
2 1 

3 1 

4 1 

5 1 

6 1 
6 2 

7 1 

8 1 

9 1 

9 2 

10 1 
10 2 

10 3 

11 1 
12 1 
12 2 

12 3 

13 1 

13 2 

13 3 

14 1 

14 2 

14 3 

14 4 

15 1 

15 2 

15 3 

15 4 

16 1 
16 2 

16 3 

17 1 

18 1 

18 2 
19 1 

19 2 

20 1 
20 2 
21 1 
22 1 

23 1 

24 1 

24 2 

25 1 

25 2 

25 3 

25 4 

26 1 

26 2 
26 3 

26 4 

26 5 


0 0 

0 0 

0 0 

0 0 

0 0 

1 0 

0 6 

0 0 

0 0 

1 0 

0 5 

1 0 

1 12 

0 16 

0 0 

1 0 

1 10 

0 15 

1 0 

1 3 

1 16 

1 0 

1 3 

1 9 

0 21 

1 0 

1 7 

1 10 

1 16 

1 0 

1 3 

1 15 

0 0 

1 0 

0 1 

1 0 

1 2 

1 0 

0 25 

0 0 

0 0 

0 0 

1 0 

1 28 

1 0 

1 2 

1 17 

0 22 

1 0 

1 3 

1 6 

1 8 

0 12 


0 0 
1 0 

4 0 

7 0 

10 0 
6 0 
10 0 

14 0 
18 0 

5 0 

18 0 
12 0 
16 0 
18 0 
23 0 

10 0 

15 0 

23 0 

3 0 

16 0 

23 0 

3 0 

9 0 

21 0 

23 0 

7 0 
10 0 
16 0 

24 0 

3 0 

15 0 

25 0 

26 0 
1 0 

26 0 
2 0 
26 0 
25 0 

28 0 
29 0 

29 0 

29 0 
28 0 

30 0 
2 0 

17 0 

22 0 
30 0 

3 0 

6 0 

8 0 

12 0 
30 0 


1 1 

1 3 

2 1 

1 1 

5 1 

4 1 

4 1 

1 1 

1 1 

1 3 

1 3 

1 1 

1 1 

1 1 

3 3 

1 3 

1 3 

1 3 

1 1 

1 1 

1 1 

3 1 

3 1 

3 1 

3 1 

2 3 

2 3 

2 3 

2 3 

1 1 

1 1 

1 1 

1 2 

8 1 

8 1 

1 4 

1 4 

1 2 

1 2 

1 4 

1 2 

4 1 

1 6 

1 6 

1 5 

1 5 

1 5 

1 5 

2 1 

2 1 

2 1 

2 1 

2 1 


To illustrate the above general data layout, we 
present in Table 8.4 the data for the first 26 sub¬ 
jects from a study of recurrent bladder cancer 
tumors (Byar, 1980 and Wei, Lin, and Weissfeld, 
1989). The entire dataset contained 86 patients, 
each followed for a variable amount of time up to 
64 months. 

The repeated event being analyzed is the recur¬ 
rence of bladder cancer tumors after transurethral 
surgical excision. Each recurrence of new tumors 
was treated by removal at each examination. 

About 25% of the 86 subjects experienced four 
events. 

The exposure variable of interest is drug treat¬ 
ment status (tx, 0 = placebo, 1 = treatment with 
thiotepa). The covariates listed here are initial 
number of tumors (num) and initial size of tumors 
(size) in centimeters. The paper by Wei, Lin, and 
Weissfeld actually focuses on a different method 
of analysis (called “marginal”), which requires a 
different data layout than shown here. We later 
describe the “marginal” approach and its corre¬ 
sponding layout. 

In these data, it can be seen that 16 of these sub¬ 
jects (id #s 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 17, 18, 20,21, 
22, 23) had no recurrent events, 4 subjects had 2 
recurrent events (id #s 10, 12, 19, 24), 4 subjects 
(id #s 13, 14, 16, 25) had 3 recurrent events, and 2 
subjects (id #s 15, 26) had 4 recurrent events. 

Moreover, 9 subjects (id #s 6, 9, 10, 12, 14, 18, 20, 
25, 26) were observed for an additional event-free 
time interval after their last event. Of these, 4 sub¬ 
jects (id #s 6, 9, 18,20) experienced only one event 
(i.e., no recurrent events). 
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V. The Counting Process 
Model and Method 

Cox PH Model 

h(t,X) = h 0 (t)exp[S(3 i X i ] 

Need to 

Assess PH assumption for X; 
Consider stratified Cox or extended 
Cox if PH assumption not 
satisfied 

Use extended Cox for time- 
dependent variables 


Recurrent event Nonrecurrent 

data event data 

(Likelihood function formed differently) 


Subjects with > 1 
time interval 
remain in the risk 
set until last 
interval is 
completed 
Different lines of 
data are treated 
as independent 
even though 
several outcomes 
on the same 
subject 


Subjects removed 
from risk set at 
time of failure 
or censorship 


Different lines of 
data are treated 

as independent 

because they 
come from 

different 

subjects 


The model typically used to carry out the Count¬ 
ing Process approach is the standard Cox PH 
model, once again shown here at the left. 

As usual, the PH assumption needs to be evalu¬ 
ated for any time-independent variable. A strati¬ 
fied Cox model or an extended Cox model would 
need to be used if one or more time-independent 
variables did not satisfy the PH assumption. Also, 
an extended Cox model would be required if inher¬ 
ently time-dependent variables were considered. 


The primary difference in the way the Cox model 
is used for analyzing recurrent event data versus 
nonrecurrent (one time interval per subject) data 
is the way several time intervals on the same sub¬ 
ject are treated in the formation of the likelihood 
function maximized for the Cox model used. 

To keep things simple, we assume that the data 
involve only time-independent variables satisfying 
the PH assumption. For recurrent survival data, a 
subject with more than one time interval remains 
in the risk set until his or her last interval, after 
which the subject is removed from the risk set. In 
contrast, for nonrecurrent event data, each subject 
is removed from the risk set at the time of failure 
or censorship. 

Nevertheless, for subjects with two or more in¬ 
tervals, the different lines of data contributed by 
the same subject are treated in the analysis as if 
they were independent contributions from differ¬ 
ent subjects, even though there are several out¬ 
comes on the same subject. 

In contrast, for the standard Cox PH model ap¬ 
proach for nonrecurrent survival data, different 
lines of data are treated as independent because 
they come from different subjects. 
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Cox PH Model for CP Approach: 
Bladder Cancer Study 

h(t, X) = ho(t)exp[|3 tx + y l num 

+ y 2 size] 

where 

tx = 1 if thiotepa, 0 if placebo 
num = initial # of tumors 
size = initial size of tumors 


For the bladder cancer study described in 
Table 8.4, the basic Cox PH model fit to these data 
takes the form shown at the left. 


The primary (exposure) variable of interest in this 
model is the treatment variable tx. The variables 
num and size are considered as potential con- 
founders. All three variables are time-independent 
variables. 


No-interaction Model 

Interaction model would involve 
product terms 

tx x num and/or tx x size 


This is a no-interaction model because it does not 
contain product terms of the form tx x num or 
tx x size. An interaction model involving such 
product terms could also be considered, but we 
only present the no-interaction model for illustra¬ 
tive purposes. 


Table 8.5. Ordered Failure Time 
and Risk Set Information for First 
26 Subjects in Bladder Cancer 
Study 


Ordered 

failure 

times 

*0) 

# in 
risk 

set 

n i 

# 

failed 

m i 

# 

censored 

in 

fiffl’ki+i)) 

Subject 

ID #s for 

outcomes 

in 

[%*%!)) 

0 

26 

_ 

1 

1 

1 

25 

1 

1 

2 , 18 

2 

24 

2 

0 

19,25 

3 

24 

4 

1 

3, 13, 14, 
16,26 

5 

23 

1 

0 

9 

6 

23 

2 

0 

6 , 26 

7 

23 

1 

1 

4, 15 

8 

22 

1 

0 

26 

9 

22 

1 

0 

14 

10 

22 

2 

2 

5, 6 , 12, 15 

12 

20 

2 

1 

7, 10, 26 

15 

19 

2 

0 

12 , 16 

16 

19 

3 

0 

10, 13, 15 

17 

19 

1 

3 

8 , 9, 10, 25 

21 

16 

1 

0 

14 

22 

16 

1 

0 

25 

23 

16 

1 

3 

11, 12, 13, 14 

24 

12 

1 

0 

15 

25 

11 

2 

0 

16, 20 

26 

10 

1 

2 

17, 18, 19 

28 

7 

1 

4 

20 , 21 , 22 , 

23,24 

30 

3 

1 

2 

24, 25, 26 


Table 8.5 at the left provides ordered failure times 
and corresponding risk set information that would 
result if the first 26 subjects that we described in 
Table 8.4 comprised the entire dataset. (Recall that 
there are 86 subjects in the complete dataset.) 

Because we consider 26 subjects, the number in 
the risk set at ordered failure time t(o) is n( 0 ) = 26. 
As these subjects fail (i.e., develop a bladder cancer 
tumor) or are censored over follow-up, the num¬ 
ber in the risk set will decrease from the jth to 
the j + 1th ordered failure time provided that no 
subject who fails at time ty) either has a recurrent 
event at a later time or has additional follow-up 
time until later censorship. In other words, a sub¬ 
ject who has additional follow-up time after having 
failed at t(j) does not drop out of the risk set after 
bp- 


32 


21 





342 8. Recurrent Event Survival Analysis 


Table 8.6. Focus on Subject #s 19 
and 25 


t(j) 

n i 

mj 

9(j) 

Subject ID #s 

0 

26 

— 

1 

1 

1 

25 

1 

1 

2, 18 

2 

24 

2 

0 

19, 25 

3 

24 

4 

1 

• 

3, 13, 14, 16,26 




17 

19 

1 

3 

8, 9, 10, 25 

21 

16 

1 

0 

14 

22 

16 

1 

0 

25 

23 

16 

1 

3 

11, 12, 13, 14 

24 

12 

1 

0 

15 

25 

11 

2 

0 

16,20 

26 

10 

1 

2 

17, 18, 19 

28 

7 

1 

4 

20,21,22,23, 24 

30 

3 

1 

2 

24, 25, 26 

Table 8.7. Focus on Subject #s 3, 

13, 

14, 

16, 26 



hj) 

n i 

m j 

0(j) 

Subject ID #s 

0 

26 

— 

1 

1 

l 

25 

1 

1 

2, 18 

2 

24 

2 

0 

19,25 

3 

24 

4 

1 

3, 13, 14, 16, 26 

5 

23 

1 

0 

9 

6 

23 

2 

0 

6, 26 

7 

23 

1 

1 

4, 15 

8 

22 

1 

0 

26 

9 

22 

1 

0 

14 

10 

22 

2 

2 

5, 6, 12, 15 

12 

20 

2 

1 

7, 10, 26 

15 

19 

2 

0 

12, 16 

16 

19 

3 

0 

10, 13, 15 

17 

19 

1 

3 

8, 9, 10, 25 

21 

16 

1 

0 

14 

22 

16 

1 

0 

25 

23 

16 

1 

3 

11, 12, 13, 14 

24 

12 

1 

0 

15 

25 

11 

2 

0 

16, 20 

26 

10 

1 

2 

17, 18, 19 

28 

7 

1 

4 

20,21,22,23, 24 

30 

3 

1 

2 

24, 25, 26 


For example, at month ty) = 2, subject #s 19 and 
25 fail, but the number in the risk set at that time 
(nj = 24) does not decrease (by 2) going into the 
next failure time because each of these subjects 
has later recurrent events. In particular, subject 
#19 has a recurrent event at month ty) = 26 and 
subject #25 has two recurrent events at months 
ty) = 17 and ty) = 22 and has additional follow¬ 
up time until censored month 30. 


As another example from Table 8.5, subject #s 3, 
13, 14, 16, 26 contribute information at ordered 
failure time ty) = 3, but the number in the risk set 
only drops from 24 to 23 even though the last four 
of these subjects all fail at ty) = 3. Subject #3 is 
censored at month 4 (see Table 8.4), so this sub¬ 
ject is removed from the risk set after failure time 
ty) = 3. However, subjects 13, 14, 16, and 26 all 
have recurrent events after ty) = 3, so they are not 
removed from the risk set after ty) = 3. 

Subject #26 appears in the last column 5 times. 
This subject contributes 5 (start, stop) time inter¬ 
vals, fails at months 3, 6, 8, and 12, and is also 
followed until month 30, when he is censored. 
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“Gaps” in follow-up time: 

0 10 gap 25 50 

lost re-enter 

No Interaction Cox PH Model 


Another situation, which is not illustrated in these 
data, involves “gaps” in a subject's follow-up time. 
A subject may leave the risk set (e.g., lost to follow¬ 
up) at, say, time =10 and then re-enter the risk 
set again and be followed from, say, time = 25 to 
time = 50. This subject has a follow-up gap during 
the period from time = 10 to time = 25. 


h(t,X) = ho(t)exp[|3 tx +yj num 
+ y 2 size] 

Partial likelihood function: 

L = Li x L 2 x • • • x L 22 


The (partial) likelihood function (L) used to fit the 
no-interaction Cox PH model is expressed in typi¬ 
cal fashion as the product of individual likelihoods 
contributed by each ordered failure time and cor¬ 
responding risk set information in Table 8.5. There 
are 22 such terms in this product because there are 
22 ordered failure times listed in Table 8.5. 


Lj = individual likelihood at t® 

= Pr[ failing at t (J ) | survival up to 

t(j)] 

j = 1.2.22 

j _ exp(|3tX(j) + Tinumij) + y 2 size(j)) 

J £ exp(ptx s (j) + y 1 num s g ) +Y 2 size s ( j )) 

s in R(t(j)) 


Each individual likelihood Lj essentially gives the 
conditional probability of failing at time t®, given 
survival (i.e., remaining in the risk set) at t(j). 

If there is only one failure at the jth ordered fail¬ 
ure time, Lj is expressed as shown at the left for 
the above no-interaction model. In this expression 
txy), nuni|j), and sizey) denote the values of the 
variables tx, num, and size for the subject failing 
at month t(j). 


txy), num(j), and size^ values of tx, 
num, and size at ty) 


tx s (j), num s (j), and size s( j) values of tx, 
num, and size for subject s in R(ty)) 


The terms tx s y), num s( j), and size s y) denote the val¬ 
ues of the variables tx, num, and size for the sub¬ 
ject s in the risk set R(ty)). Recall that R(ty)) con¬ 
sists of all subjects remaining at risk at failure 
time ty). 


Data for Subject #25 


id 

int 

event 

start 

stop 

tx 

num 

size 

25 

1 

1 

0 

2 

0 

1 

5 

25 

2 

1 

2 

17 

0 

1 

5 

25 

3 

1 

17 

22 

0 

1 

5 

25 

4 

0 

22 

30 

0 

1 

5 


j = 15th ordered failure time 
nis = 16 subjects in risk set at 
t(i5) = 22: 


For example, subject #25 from Table 8.4 failed for 
the third time at month 22, which is the j = 15th 
ordered failure time listed in Table 8.5. It can be 
seen that nj = 16 of the initial 26 subjects are still 
at risk at the beginning of month 22. The risk set 
at this time includes subject #25 and several other 
subjects (#s 12, 13, 14, 15, 16, 18, 19, 26) who al¬ 
ready had at least one failure prior to month 22. 


R(t ( i 5 ) = 22) = {subject #s 11, 12, 
13, 14, 15, 16, 17, 18, 19,20,21, 
22, 23, 24, 25, 26} 







344 8. Recurrent Event Survival Analysis 


L _ ex P (p( 0 ) + Yi(i) + -Y 1 ( 5 )) _ The corresponding likelihood L 15 at tps) = 22 is 

M lo ,,) Cxp(ptXs<ls)+Y,nums,15>+YlSlzc ' (,5,) shown at the left. Subject #25's values tx 2 5 (i 5 ) = 0, 

num 2 5 (i 5 ) = 1, and size 25 (is) = 5, have been in¬ 
serted into the numerator of the formula. The de¬ 
nominator will contain a sum of 16 terms, one for 
each subject in the risk set at tps) = 22 . 

Computer program formulates The overall partial likelihood L will be formulated 
partial likelihood L internally by the computer program once the data 

(See Computer Appendix) layout is in the correct form and the program code 

used involves the (start, stop) formulation. 


VI. Robust Estimation 


Data for Subject #14 


id 

int 

event 

start 

stop 

tx 

num 

size 

14 

1 

1 

0 

3 

0 

3 

1 

14 

2 

1 

3 

9 

0 

3 

1 

14 

3 

1 

9 

21 

0 

3 

1 

14 

4 

0 

21 

23 

0 

3 

1 


As illustrated for subject #14 at the left, each sub¬ 
ject contributes a line of data for each time inter¬ 
val corresponding to each recurrent event and any 
additional event-free follow-up interval. 


Up to this point: We have also pointed out that the Cox model anal- 

the 4 lines of data for subject #14 are ysis described up to this point treats different 
treated as independent observations lines of data contributed by the same subject as 

if they were independent contributions from dif¬ 
ferent subjects. 


Nevertheless, Nevertheless, it makes sense to view the different 

intervals contributed by a given subject as repre- 

• Observations of the same senting correlated observations on the same sub¬ 
subject are correlated ject that must be accounted for in the analysis. 

• Makes sense to adjust for such 
correlation in the analysis 


Robust (Empirical) Estimation 

• Adjusts 
Var (|3 k ) 
where 
Pk 

is an estimated regression 
coefficient 

• accounts for misspecification of 
assumed correlation structure 


A widely used technique for adjusting for the cor¬ 
relation among outcomes on the same subject is 
called robust estimation (also referred to as em¬ 
pirical estimation). This technique essentially in¬ 
volves adjusting the estimated variances of re¬ 
gression coefficients obtained for a fitted model 
to account for misspecification of the correlation 
structure assumed (see Zeger and Liang, 1986 and 
Kleinbaum and Klein, 2002). 
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CP approach: assumes 
independence 

In the CP approach, the assumed correlation 
structure is independence; that is, the Cox PH 
model that is fit assumes that different outcomes 

Goal of robust estimation: adjust for 
correlation within subjects 

on the same subject are independent. Therefore 
the goal of robust estimation for the CP approach 
is to obtain variance estimators that adjust for cor¬ 
relation within subjects when previously no such 
correlation was assumed. 

Same goal for other approaches for 
analyzing recurrent event data 

This is the same goal for other approaches for an¬ 
alyzing recurrent event data that we describe later 
in this chapter. 

Do not adjust 

Note that the estimated regression coefficients 

3k 

themselves are not adjusted; only the estimated 
variances of these coefficients are adjusted. 

Only adjust 


Var (|3 k ) 


Robust ( Empirical) Variance 

The robust (i.e., empirical) estimator of the vari¬ 

allows 

tests of hypotheses and 
confidence intervals 
that account for correlated data 

ance of an estimated regression coefficient there¬ 
fore allows tests of hypotheses and confidence in¬ 
tervals about model parameters that account for 
correlation within subjects. 

Matrix formula: 

We briefly describe the formula for the robust vari¬ 
ance estimator below. This formula is in matrix 

derived from ML estimation 

form and involves terms that derive from the set 
of “score” equations that are used to solve for ML 
estimates of the regression coefficients. This in¬ 
formation may be of interest to the more mathe¬ 
matically inclined reader with some background 
in methods for the analysis of correlated data 
(Kleinbaum and Klein, 2002). 

Formula not essential for using 
computer packages 

However, the information below is not essential 
for an understanding of how to obtain robust esti¬ 
mators using computer packages. (See Computer 
Appendix for the code using SAS and Stata.) 
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Extension (Lin and Wei, 1989) of The robust estimator for recurrent event data was 
information sandwich estimator derived by Lin and Wei (1989) as an extension sim- 
(Zeger and Liang, 1986) ilar to the “information sandwich estimator” pro¬ 

posed by Zeger and Liang (1986) for generalized 
linear models. SAS and Stata use variations of this 
estimator that give slightly different estimates. 


Matrix formula 

R(P) = \^r(p)[R^R s ]Var(P) 

where 

Var (0) 

is the information matrix, and 

Rs 

is matrix of score residuals 


The general form of this estimator can be most 
conveniently written in matrix notation as shown 
at the left. In this formula, the variance expres¬ 
sion denotes the information matrix form of es¬ 
timated variances and covariances obtained from 
(partial) ML estimation of the Cox model being fit. 
The Rs expression in the middle of the formula de¬ 
notes the matrix of score residuals obtained from 
ML estimation. 


Formula applies to other ap- The robust estimation formula described above 
proaches for analyzing recurrent applies to the CP approach as well as other ap- 
event data proaches for analyzing recurrent event data de¬ 

scribed later in this chapter. 


VII. Results for CP Example We now describe the results from using the CP 

approach on the Bladder Cancer Study data in¬ 
volving all 85 subjects. 


Table 8.8. Edited SAS Output 
from CP Approach on Bladder 
Cancer Data (N = 85 Subjects) 
Without Robust Variances 

Parmeter Std 

Var DF Estimate Error Chisq P HR 

tx 1 -0.4071 0.2001 4.140 0.042 0.667 

num 1 0.1607 0.0480 11.198 0.001 1.174 

size 1 -0.0401 0.0703 0.326 0.568 0.961 


Table 8.8 gives edited output from fitting the no¬ 
interaction Cox PH model involving the three pre¬ 
dictors tx, num, and size. A likelihood ratio chunk 
test for interaction terms tx x num and tx x 
size was nonsignificant, thus supporting the no¬ 
interaction model shown here. The PH assump¬ 
tion was assumed satisfied for all three variables. 


-2 LOG L = 920.159 


Table 8.9. Robust Covariance 
Matrix, CP Approach on Bladder 
Cancer Data 



tx 

num 

size 

tx 

0.05848 

-0.00270 

-0.00051 

num 

-0.00270 

0.00324 

0.00124 

size 

-0.00051 

0.00124 

0.00522 


Table 8.9 provides the covariance matrix obtained 
from robust estimation of the variances of the es¬ 
timated regression coefficients of tx, num, and 
size. The values on the diagonal of this matrix 
give robust estimates of these variances and the 
off-diagonal values give covariances. 
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Robust standard error for tx 
= square-root (.05848) = 0.2418 

Nonrobust standard error for tx 

= 0.2001 


Summary of Results from 
CP Approach 

Hazard Ratio tx: exp(—0.407) = 0.667 

(= 1/1.5) 


Wald Chi-Square tx: robust nonrobust 
2.83 4.14 

P-valuetx: .09 .04 

(H 0 : no effect of tx, H A : two sided) 

95% Cl for HR tx (robust): (0.414, 
1.069) 

H a : one-sided, both p-values < .05 

We return to the analysis of these 
data when we discuss other ap¬ 
proaches for analysis of recurrent 
event data. 


Because the exposure variable of interest in this 
study is tx, the most important value in this ma¬ 
trix is 0.05848. The square root of this value is 
0.2418, which gives the robust standard error of 
the estimated coefficient of the tx variable. Notice 
that this robust estimator is similar but somewhat 
different from the nonrobust estimator of 0.2001 
shown in Table 8.8. 

We now summarize the CP results for the effect 
of the exposure variable tx on recurrent event sur¬ 
vival controlling for num and size. The hazard ra¬ 
tio estimate of .667 indicates that the hazard for 
the placebo is 1.5 times the hazard for the treat¬ 
ment. 

Using robust estimation, the Wald statistic for this 
hazard ratio is borderline nonsignificant (P = .09). 
Using the nonrobust estimator, the Wald statistic 
is borderline significant (P = .04). Both these P- 
values, however, are for a two-sided alternative. 
For a one-sided alternative, both P-values would 
be significant at the .05 level. The 95% confidence 
interval using the robust variance estimator is 
quite wide in any case. 


VIII. Other Approaches— 
Stratified Cox 

3 stratified Cox (SC) approaches: 

Conditional 1 (Prentice, 

Williams and 
Conditional 2 Peterson, 1981) 
Marginal (Wei, Lin, and 
Weissfeld,1989) 


We now describe three other approaches for ana¬ 
lyzing recurrent event data, each of which uses a 
Stratified Cox (SC) PH model. They are called con¬ 
ditional 1, conditional 2, and marginal. These 
approaches are often used to distinguish the or¬ 
der in which recurrent events occur. 


Goal: distinguish order of recurrent 
events 


Strata variable: time interval# The “strata” variable for each approach treats the 
treated as time interval number as a categorical variable, 

categorical 
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Example: 

maximum of 4 failures per subject 

4 

Strata = 1 for time interval #1 
variable 2 for time interval #2 

3 for time interval #3 

4 for time interval #4 

Conditional: time between two 

events 

Conditional 1 0 50 —> 80 

entry 

Conditional 2 _ 0 -» 30 

evl ev2 


Marginal 

• Total survival time from study 
entry until kth event 

• Recurrent events of different 
types 


Conditional 

1 for 

Subject 

10 


id 

int 

event 

start 

stop 

tx 

num 

size 

10 

1 

1 

0 

12 

0 

i 

i 

10 

2 

1 

12 

16 

0 

i 

i 

10 

3 

0 

16 

18 

0 

i 

i 

Conditional 

2 for 

Subject 

10 


(stop = 

Interval Length Since 

Previous 



Event) 






id 

int 

event 

start 

stop 

tx 

num 

size 

10 

1 

1 

0 

12 

0 

i 

i 

10 

2 

1 

0 

4 

0 

i 

i 

10 

3 

0 

0 

2 

0 

i 

i 


Marginal approach 
Standard (nonrecurrent event) 
layout (i.e., without (start, stop) 
columns) 


For example, if the maximum number of failures 
that occur on any given subject in the dataset is, 
say, 4, then time interval #1 is assigned to stratum 
1, time interval #2 to stratum 2, and so on. 


Both conditional approaches focus on survival 
time between two events. However, conditional 1 
uses the actual times of the two events from study 
entry, whereas conditional 2 starts survival time 
at 0 for the earlier event and stops at the later 
event. 


The marginal approach, in contrast to each con¬ 
ditional approach, focuses on total survival time 
from study entry until the occurrence of a spe¬ 
cific (e.g., kth) event; this approach is suggested 
when recurrent events are viewed to be of differ¬ 
ent types. 

The conditional 1 approach uses the exact same 
(start, stop) data layout format used for the CP 
approach, except that for conditional 1, an SC 
model is used rather than a standard (unstratified) 
PH model. The strata variable here is int in this 
listing. 


The conditional 2 approach also uses a (start, 
stop) data layout, but the start value is always 0 
and the stop value is the time interval length since 
the previous event. The model here is also a SC 
model. 


The marginal approach uses the standard (non¬ 
recurrent event) data layout instead of the (start, 
stop) layout, as illustrated below. 
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Marginal Approach for Subject 10 


id 

int 

event 

stime 

tx 

num 

size 

10 

1 

1 

12 

0 

1 

1 

10 

2 

1 

16 

0 

1 

1 

10 

3 

0 

18 

0 

1 

1 

10 

4 

0 

18 

0 

1 

1 


Marginal approach 
Each subject at risk for all failures 
that might occur 

# actual failures < # possible failures 

Bladder cancer data: 

Maximum # (possible) failures = 4 

So, subject 10 (as well as all other 
subjects) gets 4 lines of data 


Fundamental Difference Among 
the 3 SC Approaches 

Risk set differs for strata after first 
event 

Conditional 2: time until 1st event 
does not influence risk set for later 
events (i.e., clock reset to 0 after 
event occurs) 


Conditional 1: time until 1st event 
influences risk set for later events 


Marginal: risk set determined from 
time since study entry 


The marginal approach layout, shown at the left, 
contains four lines of data in contrast to the three 
lines of data that would appear for subject #10 
using the CP, conditional 1, and conditional 2 
approaches 


The reason why there are 4 lines of data here is 
that, for the marginal approach, each subject is 
considered to be at risk for all failures that might 
occur, regardless of the number of events a subject 
actually experienced. 

Because the maximum number of failures be¬ 
ing considered in the bladder cancer data is 4 
(e.g., for subject #s 15 and 26), subject #10, who 
failed only twice, will have two additional lines 
of data corresponding to the two additional fail¬ 
ures that could have possibly occurred for this 
subject. 

The three alternative SC approaches (conditional 
1, conditional 2, and marginal) fundamentally 
differ in the way the risk set is determined for 
strata corresponding to events after the first event. 


With conditional 2, the time until the first event 
does not influence the composition of the risk set 
for a second or later event. In other words, the 
clock for determining who is at risk gets reset to 0 
after each event occurs. 

In contrast, with conditional 1, the time until the 
first event affects the composition of the risk set 
for later events. 

With the marginal approach, the risk set for the 
kth event (k = f, 2,...) identifies those at risk for 
the kth event since entry into the study. 
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EXAMPLE 




Days 

ID 

Status Stratum 

Start 

Stop tx 

M 

1 

1 

0 

100 1 

M 

1 

2 

100 

105 1 

H 

1 

1 

0 

30 0 

H 

1 

2 

30 

50 0 

P 

1 

1 

0 

20 0 

P 

1 

2 

20 

60 0 

P 

1 

3 

60 

85 0 


Conditional 1 


Stratum 1 

Stratum 2 

t(i) 

n .i 

R(t©) 

t(j) 

n j R (t©) 

0 

3 

{M, H, P} 

20 

1 {P} 

20 

3 

{M, H, P} 

30 

2 {H, P} 

30 

2 

{M, H} 

50 

2 {H, P} 

100 

1 

{M} 

60 

1 (Pj 




105 

1 {M} 


Conditional 2 


Stratum 1 


Stratum 2 

hi) 

n j R(tffl) 

hi) 

n j 

R(tffl) 

0 

3 {M, H, P} 

0 

3 

{M, H, P} 

20 

3 {M, H, P} 

5 

3 

{M, H, P} 

30 

2 {M, H} 

20 

2 

{H, P} 

100 

1 {M} 

40 

1 

{P} 


Suppose, for example, that Molly (M), Holly (H), 
and Polly (P) are the only three subjects in the 
dataset shown at the left. Molly receives the treat¬ 
ment (tx = 1) whereas Holly and Polly receive the 
placebo (tx = 0). All three have recurrent events 
at different times. Also, Polly experiences three 
events whereas Molly and Holly experience two. 


The table at the left shows how the risk set changes 
over time for strata 1 and 2 if the conditional 1 
approach is used. For stratum 2, there are no sub¬ 
jects in the risk set until t = 20, when Polly gets 
the earliest first event and so becomes at risk for a 
second event. Holly enters the risk set at t = 30. So 
at t = 50, when the earliest second event occurs, 
the risk set contains Holly and Polly. Molly is not 
at risk for getting her second event until t = 100. 
The risk set at t = 60 contains only Polly because 
Holly has already had her second event at t = 50. 
And the risk set at t = 105 contains only Molly be¬ 
cause both Holly and Polly have already had their 
second event by t = 105. 

The next table shows how the risk set changes 
over time if the conditional 2 approach is used. 
Notice that the data for stratum 1 are identical to 
those for conditional 1. For stratum 2, however, 
all three subjects are at risk for the second event 
at t = 0 and at t = 5, when Molly gets her sec¬ 
ond event 5 days after the first occurs. The risk set 
at t = 20 contains Holly and Polly because Molly 
has already had her second event by t = 20. And 
the risk set at t = 40 contains only Polly because 
both Molly and Holly have already had their sec¬ 
ond event by t = 40. 
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Marginal 



Stratum 1 


Stratum 2 

t(j) 

n i 

R(tffl) 

Vi) 

n i 

R(tffl) 

0 

3 

{M, H, P} 

0 

3 

|M, H, PI 

20 

3 

{M, H, P) 

50 

3 

|M, H, PI 

30 

2 

{M, H} 

60 

2 

{M, P} 

too 

3 

{M} 

105 

1 

{M} 


Stratum 3 for Marginal approach 
follows 


Marginal 
Stratum 3 


t(j) 

n i 

R(t©) 

0 

3 

{M, H, P} 

85 

2 

{M, P) 


Note: H censored by t = 85 


Basic idea (marginal approach): 

Each failure considered a 
separate process 

Allows stratifying on 

• Failure order 

• Different failure type 

(e.g., stage 1 vs. stage 2 cancer) 

Stratified Cox PH (SC) Model for all 
3 alternative approaches 

Use standard computer program for 
SC (e.g., SAS's PHREG, Stata's stcox, 
SPSS’s coxreg) 

No-interaction SC model for blad¬ 
der cancer data 

h g (t,X) = ho g (t)exp[|3 tx + y, num 

+ y 2 size l 


We next consider the marginal approach. For 
stratum 1 , the data are identical again to those for 
conditional 1. For stratum 2, however, all three 
subjects are at risk for the second event at t = 0 
and at t = 50, when Holly gets her second event. 
The risk set at t = 60 contains Molly and Polly be¬ 
cause Holly has already had her second event at 
t = 50. And the risk set at t = 105 contains only 
Molly because both Holly and Polly have already 
had their second event by t = 60. 

Because Polly experienced three events, there is 
also a third stratum for this example, which we 
describe for the marginal approach only. 

Using the marginal approach, all three subjects are 
considered at risk for the third event when they 
enter the study (t = 0), even though Molly and 
Holly actually experience only two events. At t = 
85, when Polly has her third event, Holly, whose 
follow-up ended at t = 50, is no longer in the 
risk set; which still includes Molly because Molly’s 
follow-up continues until t = 105. 

The basic idea behind the marginal approach is 
that it allows each failure to be considered as a 
separate process. Consequently, the marginal ap¬ 
proach not only allows the investigator to consider 
the ordering of failures as separate events (i.e., 
strata) of interest, but also allows the different fail¬ 
ures to represent different types of events that may 
occur on the same subject. 


All three alternative approaches, although differ¬ 
ing in the form of data layout and the way the risk 
set is determined, nevertheless use a stratified Cox 
PH model to carry out the analysis. This allows a 
standard program that fits a SC model (e.g., SAS's 
PHREG) to perform the analysis. 

The models used for the three alternative SC ap¬ 
proaches are therefore of the same form. For ex¬ 
ample, we show on the left the no-interaction SC 
model appropriate for the bladder cancer data we 
have been illustrating. 


where g = 1, 2, 3, 4 
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Two types of SC models: 

As described previously in Chapter 5 on the strati¬ 
fied Cox procedure, a no-interaction stratified Cox 

No-interaction versus interaction 
model 

Typically compared using LR test 

model is not appropriate if there is interaction 
between the stratified variables and the predictor 
variables put into the model. Thus, it is necessary 
to assess whether an interaction version of the SC 
model is more appropriate, as typically carried out 
using a likelihood ratio test. 

Version 1 : Interaction SC Model 

For the bladder cancer data, we show at the 

h g (t,X) = h 0 g (t)exp[(3 g tx 

+ y lg num + y 2 g size] 

left two equivalent versions of the SC interaction 
model. The first version separates the data into 4 
separate models, one for each stratum. 

g = 1, 2, 3, 4 


Version 2: Interaction SC Model 

The second version contains product terms involv¬ 
ing the stratified variable with each of the 3 pre¬ 

h g (t,X) = h 0 g (t)exp[(3 tx + y l num 
+ y 2 size + 6 n(Z* x tx) 

+ 6 i 2 (Z| x tx) + 6 n(Z| x tx) 

+ S 2 i (Z* x num) + 622 (Z 2 x num) 
+ 6 23 (Z 3 x num) + 631 (Z* x size) 

+ 632 (Z 2 x size) + 633^3 x size)] 

dictors in the model. Because there are 4 strata, 
the stratified variable is defined using 3 dummy 
variables Z], Z 2 , and Z* y 

where Z*, Z 2 , and Z\ are 3 dummy 
variables for the 4 strata. 


H 0 (Version 1) 

The null hypotheses for the LR test that compares 
the interaction with the no-interaction SC model 

Pi = P 2 = P 3 = p4 = P> 

Til =712 = 713 = Y 14 = 7i. 

T21 = T 22 = T23 = T 24 = T 2 

is shown at the left for each version. The df for the 
LR test is 9. 

H 0 (Version 2] 



Sll = S12 = S13 = 621 = &22 = 623 
= 631 = 632 = 633 = 0 
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Interaction SC model may be used 
regardless of LR test result 

• Allows separate HRs for tx for 
each stratum 

• If no-interaction SC, then only 
an overall effect of tx can be 
estimated 


Even if the no-interaction SC model is found more 
appropriate from the likelihood ratio test, the in¬ 
vestigator may still wish to use the interaction SC 
model in order to obtain and evaluate different 
hazard ratios for each stratum. In other words, if 
the no-interaction model is used, it is not possible 
to separate out the effects of predictors (e.g., tx) 
within each stratum, and only an overall effect of 
a predictor on survival can be estimated. 


Recommend using 

robust estimation 

R(P) = Var( (3) | RsRs | Var( p) 

to adjust for correlation of observa¬ 
tions on the same subject 


For each of the SC alternative approaches, as for 
the CP approach, it is recommended to use ro¬ 
bust estimation to adjust the variances of the es¬ 
timated regression coefficients for the correlation 
of observations on the same subject. The general 
form for the robust estimator is the same as in the 
CP approach, but will give different numerical re¬ 
sults because of the different data layouts used in 
each method. 


IX. Bladder Cancer Study 
Example (Continued) 


Table 8.10. Estimated (3s and HRs 
for tx from Bladder Cancer Data 


Model 

[3 

HR = exp((3) 

CP 

-0.407 

0.666 (=1/1.50) 

Cl 

-0.334 

0.716 (=1/1.40) 

C2 

-0.270 

0.763 (=1/1.31) 

M 

-0.580 

0.560 (=1/1.79) 


CP = Counting Process, Cl = Conditional 1, 
C2 = Conditional 2, M = Marginal 


We now present and compare SAS results from us¬ 
ing all four methods we have described—CP, con¬ 
ditional 1, conditional 2, and marginal—for an¬ 
alyzing the recurrent event data from the bladder 
cancer study. 

Table 8.10 gives the regression coefficients for the 
tx variable and their corresponding hazard ratios 
(i.e., exp(|3) for the no-interaction Cox PH mod¬ 
els using these four approaches). The model used 
for the CP approach is a standard Cox PH model 
whereas the other three models are SC models that 
stratify on the event order. 


HR for M: 0.560 (=1/1.79) 
differs from 

HRs for CP: 0.666 (=1/1.50), 
Cl: 0.716 (=1/1.40), 
C2: 0.763 (=1/1.31) 


From this table, we can see that the hazard ratio 
for the effect of the exposure variable tx differs 
somewhat for each of the four approaches, with 
the marginal model giving a much different re¬ 
sult from that obtained from the other three ap¬ 
proaches. 
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Table 8.11. Estimated (3s, SE((3)s, 
and P-Values for tx from 
No-Interaction Model for Bladder 
Cancer Data 


Model 

P 

SE(NR) 

SE(R) 

P(NR) P(R) 

CP 

-0.407 

0.200 

0.242 

.042 

.092 

Cl 

-0.334 

0.216 

0.197 

.122 

.090 

C2 

-0.270 

0.208 

0.208 

.195 

.194 

M 

-0.580 

0.201 

0.303 

.004 

.056 


CP = Counting Process, Cl = Conditional 1, 
C2 = Conditional 2, M = Marginal, 

NR = Nonrobust, R = Robust, P = Wald P-value 


SE(NR) differs from SE(R) 
P(NR) differs from P(R) 
but no clear pattern 

for example, 

CP: P(NR) = .042 < P(R) = .092 
Cl: P(NR) = .122 > P(R) = .090 
C2: P(NR) = .195 = P(R) = .194 


Wald test statistic(s): 

Z = (3/SE((3) Z 2 = [|3/SE(|3)] 2 
~ N(0,1) under H 0 : (3 = 0 ~ 


Table 8.12. Estimated (3s and 
Robust SE(|3)s for tx from 
Interaction SC Model for Bladder 
Cancer Data 


Interaction SC Model 






No 

Strl 

Str2 

Str3 

Str4 

Interaction 

Pi 

p2 

Ps 

p4 

p 

Model (SE) 

(SE) 

(SE) 

(SE) 

(SE) 


CP 

- 

- 

- 

- 

-.407 

(.242) 

Cl 

-.518 

-.459 

.117 

-.041 

-.334 


(.308) 

(.441) 

(.466) 

(.515) 

(.197) 

C2 

-.518 

-.259 

.221 

-.195 

-.270 


(.308) 

(.402) 

(.620) 

(.628) 

(.208) 

M 

-.518 

-.619 

-.700 

-.651 

-.580 


(.308) 

(.364) 

(.415) 

(.490) 

(.303) 


CP = Counting Process, Cl = Conditional 1, 
C2 = Conditional 2, M = Marginal 


Table 8.11 provides, again for the exposure vari¬ 
able tx only, the regression coefficients, robust 
standard errors, nonrobust standard errors, and 
corresponding Wald Statistic P-values obtained 
from using the no-interaction model with each 
approach. 


The nonrobust and robust standard errors and P- 
values differ to some extent for each of the dif¬ 
ferent approaches. There is also no clear pattern 
to suggest that the nonrobust results will always 
be either higher or lower than the corresponding 
robust results. 


The P-values shown in Table 8.11 are computed us¬ 
ing the standard Wald test Z or chi-square statistic, 
the latter having a chi-square distribution with 1 
df under the null hypothesis that there is no effect 
of tx. 

Table 8.12 presents, again for the exposure vari¬ 
able tx only, the estimated regression coefficients 
and robust standard errors for both the interac¬ 
tion and the no-interaction SC models for the 3 
approaches (other than the CP approach) that use 
a SC model. 

Notice that for each of the 3 SC modeling ap¬ 
proaches, the estimated (3s and corresponding 
standard errors are different over the four strata 
as well as for the no-interaction model. For exam¬ 
ple, using the conditional 1 approach, the esti¬ 
mated coefficients are —0.518, —0.459, —0.117, 
—0.041, and —0.334 for strata 1 through 4 and 
the no-interaction model, respectively. 
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Version 1: Interaction SC Model Such differing results over the different strata 

should be expected because they result from fit- 
h g (t,X) = h 0g (t)exp[|3 g tx ting an interaction SC model, which by definition 

+ y num + y size] allows for different regression coefficients over the 
Klg Y2g strata. 

g = 1, 2, 3, 4 

Note: subscript g allows for differ¬ 
ent regression coefficients for each 
stratum 


Conditional 1 for Subject 10 


id 

int event start 

stop 

tx num size 

10 

1 1 0 

12 

0 1 1 


Conditional 2 for Subject 10 
Stop = Interval Length Since 
Previous Event 


id 

int 

event start stop tx num size 

10 

1 

1 0 12 0 1 1 

Marginal Approach for Subject 10 

id 

int 

event stime tx num size 

10 

1 

1 12 0 1 1 


Note: int = stratum # 


Notice also that for stratum 1 the estimated (3 
and its standard error are identical (—0.518 and 
0.308, resp.) for the conditional 1, conditional 
2, and marginal approaches. This is as expected 
because, as illustrated for subject #10 at the left, 
the survival time information for first stratum is 
the same for stratum 1 for the 3 SC approaches 
and does not start to differ until stratum 2. 


Marginal approach 

start time = 0 always 
stop time = stime 

Subject #10: (start, stop) = (0, 12) 


Although the data layout for the marginal ap¬ 
proach does not require (start,stop) columns, the 
start time for the first stratum (and all other strata) 
is 0 and the stop time is given in the stime column. 
In other words, for stratum 1 on subject #10, the 
stop time is 0 and the start time is 12, which is the 
same as for the conditional 1 and 2 data for this 
subject. 


Bladder Cancer Study So, based on all the information we have provided 

above concerning the analysis of the bladder can- 

1. Which approach is best? ce r study, 

2. Conclusion about tx? 

1. Which of the four recurrent event analysis ap¬ 
proaches is best? 

2. What do we conclude about the estimated effect 
of tx controlling for num and size? 
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Which of the 4 approaches is best? 

It depends! 

CP: Don’t want to distinguish 
recurrent event order 
Want overall effect 


The answer to question 1 is probably best phrased 
as, “It depends!” Nevertheless, if the investigator 
does not want to distinguish between recurrent 
events on the same subject and wishes an overall 
conclusion about the effect of tx, then the CP ap¬ 
proach seems quite appropriate, as for this study. 


If event order important: If, however, the investigator wants to distinguish 

the effects of tx according to the order that the 
Choose from the 3 SC approaches. event occurs (i.e., by stratum #), then one of 

the three SC approaches should be preferred. So, 
which one? 


Conditional 1: time of recurrent The conditional 1 approach is preferred if the 
event from entry study goal is to use time of occurrence of each 

into the study recurrent event from entry into the study to assess 

a subject’s risk for an event of a specific order (i.e., 
as defined by a stratum #) to occur. 


Conditional 2: Use time from 

previous event to 
next recurrent 
event 


The conditional 2 approach would be preferred if 
the time interval of interest is the time (reset from 
0 ) from the previous event to the next recurrent 
event rather than time from study entry until each 
recurrent event. 


Marginal: Consider strata as 

representing different 
event types 


Finally, the marginal approach is recommended 
if the investigator wants to consider the events 
occurring at different orders as different types of 
events, for example different disease conditions. 


Conditional 1 versus Marginal 
(subtle choice) 

Recommend: Choose conditional 1 
unless strata 
represent different 
event types 


We (the authors) consider the choice between the 
conditional 1 and marginal approaches to be 
quite subtle. We prefer conditional 1, provided 
the different strata do not clearly represent dif¬ 
ferent event types. If, however, the strata clearly 
indicate separate event processes, we would rec¬ 
ommend the marginal approach. 


What do we conclude about tx? Overall, based on the above discussion, we think 

that the CP approach is an acceptable method to 
Conclusions based on results from use for analyzing the bladder cancer data. If we 
CP and conditional 1 approaches had to choose one of the three SC approaches as 

an alternative, we would choose the conditional 
1 approach, particularly because the order of re¬ 
current events that define the strata doesn’t clearly 
distinguish separate disease processes. 
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Table 8.13. Comparison of Results 
Obtained from No-Interaction 
Models Across Two Methods for 
Bladder Cancer Data 



Counting 

Conditional 


process 

1 

Parameter 

-0.407 

-0.334 

estimate 

Robust 

0.2418 

0.1971 

standard 



error 



Wald 

2.8338 

2.8777 

chi-square 

p-value 

0.0923 

0.0898 

Hazard 

0.667 

0.716 

ratio 



95% 

(0.414, 1.069) 

(0.486, 1.053) 

confidence 

interval 




Table 8.13 summarizes the results for the CP and 
conditional 1 approaches with regard to the effect 
of the treatment variable (tx), adjusted for the con¬ 
trol variables num and size. We report results only 
for the no-interaction models, because the interac¬ 
tion SC model for the conditional 1 approach was 
found (using LR test) to be not significantly differ¬ 
ent from the no-interaction model. 

The results are quite similar for the two differ¬ 
ent approaches. There appears to be a small effect 
of tx on survival from bladder cancer: HR(CP) = 
0.667 = 1/1.50, HR(C1) = 0.716 = 1/1.40. This 
effect is borderline nonsignificant (2-sided tests): 
P(CP) = .09 = P(C1). 95% confidence intervals 
around the hazard ratios are quite wide, indicat¬ 
ing an imprecise estimate of effect. 

Overall, therefore, these results indicate that there 
is no strong evidence that tx is effective (after 
controlling for num and size) based on recurrent 
event survival analyses of the bladder cancer data. 


X. A Parametric Approach 
Using Shared Frailty 

Compared 4 approaches in previous 
section 

• Each used a Cox model 

• Robust standard errors 

o Adjusts for correlation from 
same subject 


In the previous section we compared results ob¬ 
tained from using four analytic approaches on 
the recurrent event data from the bladder cancer 
study. Each of these approaches used a Cox model. 
Robust standard errors were included to adjust for 
the correlation among outcomes from the same 
subject. 


We now present a parametric ap¬ 
proach 

• Weibull PH model 

• Gamma shared frailty 
component 

• Bladder Cancer dataset 

o Data layout for the counting 
process approach 

Can review Chapter 7 
Weibull model (Section VI) 

Frailty models (Section XII) 


In this section we present a parametric approach 
for analyzing recurrent event data that includes 
a frailty component. Specifically, a Weibull PH 
model with a gamma distributed shared frailty 
component is shown using the Bladder Cancer 
dataset. The data layout is the same as described 
for the counting process approach. It is recom¬ 
mended that the reader first review Chapter 7, par¬ 
ticularly the sections on Weibull models (Section 
VI) and frailty models (Section XII). 
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Hazard conditioned on frailty a k 

h k (t|a, X jk ) = a k h(t|X jk ) 

where a ~ gamma(p = 1, var = 0) 
and where h(t|Xj k ) = Xjkpt p ~ x 
(Weibull) with Xjk = exp((3 0 + 
Pjtxjk + (3 2 num jk + (3 3 size jk ) 


Including shared frailty 

• Accounts for unobserved factors 
o Subject specific 
o Source of correlation 
o Observations clustered by 
subject 


Robust standard errors 

• Adjusts standard errors 

• Does not affect coefficient 
estimates 

Shared frailty 

• Built into model 

• Can affect coefficient estimates 
and their standard errors 


Weibull regression (PH form) 
Gamma shared frailty 
Log likelihood = —184.73658 


_t 

Coef. 

Std. Err. 

z 

P> |z| 

tx 

-.458 

.268 

-1.71 

0.011 

num 

.184 

.072 

2.55 

0.327 

size 

-.031 

.091 

-0.34 

0.730 

_cons 

-2.952 

.417 

-7.07 

0.000 

/ln_p 

-.119 

.090 

-1.33 

0.184 

/ln.the 

-.725 

.516 

-1.40 

0.160 

P 

.888 

.080 



1 /p 

1.13 

.101 



theta 

.484 

.250 




Likelihood ratio test of theta = 0: 

chibar(Ol) = 7.34 

Prob >= chibar2 = 0.003 


We define the model in terms of the hazard of the 
jth outcome occurrence for the kth subject con¬ 
ditional on his or her frailty <x k . The frailty is a 
multiplicative random effect on the hazard func¬ 
tion h(t|Xj k ), assumed to follow a gamma distribu¬ 
tion of mean 1 and variance theta 0. We assume 
h(t|Xj k ) follows a Weibull distribution (shown at 
left). 

The frailty is included in the model to account 
for variability due to unobserved subject-specific 
factors that are otherwise unaccounted for by 
the other predictors in the model. These unob¬ 
served subject-specific factors can be a source 
of within-subject correlation. We use the term 
shared frailty to indicate that observations are 
clustered by subject and each cluster (i.e., subject) 
shares the same level of frailty. 

In the previous sections, we have used robust vari¬ 
ance estimators to adjust the standard errors of 
the coefficient estimates to account for within- 
subject correlation. Shared frailty is not only an 
adjustment, but also is built into the model and 
can have an impact on the estimated coefficients 
as well as their standard errors. 


The model output (obtained using Stata version 
7) is shown on the left. The inclusion of frailty in 
a model (shared or unshared) leads to one addi¬ 
tional parameter estimate in the output (theta, the 
variance of the frailty). A likelihood ratio test for 
theta = 0 yields a statistically significant p-value 
of 0.003 (bottom of output) suggesting that the 
frailty component contributes to the model and 
that there is within-subject correlation. 

The estimate for the Weibull shape parameter p 
is 0.888 suggesting a slightly decreasing hazard 
over time because p < 1. However, the Wald test 
for In (p) = 0 (or equivalently p = 1) yields a non¬ 
significant p-value of 0.184. 
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Comparing Hazard Ratios 

Weibull with frailty model 

HR{ tx) = exp(—0.458) = 0.633 
95% Cl = exp[-0.458 ± 1.96(0.268)] 
= (0.374, 1.070) 


Counting processes approach with 
Cox model 


An estimated hazard ratio of 0.633 for the effect 
of treatment comparing two individuals with the 
same level of frailty and controlling for the other 
covariates is obtained by exponentiating the es¬ 
timated coefficient (—0.458) for tx. The estimated 
hazard ratio and 95% confidence intervals are sim¬ 
ilar to the corresponding results obtained using 
a counting processes approach with a Cox model 
and robust standard errors (see left). 


HR{ tx) : exp(-0.407) = 0.667 

95% Cl for HR tx (robust): (0.414, 
1.069) 


Interpretations of HR from frailty 
model 

• Compares 2 individuals with 
same a 

• Compares individual with 
himself 

o What is effect if individual 
had used treatment rather 
than placebo? 


Another interpretation for the estimated hazard 
ratio from the frailty model involves the compari¬ 
son of an individual to himself. In other words, this 
hazard ratio describes the effect on an individual’s 
hazard (i.e., conditional hazard) if that individual 
had used the treatment rather than the placebo. 


XI. A Second Example 

Age-Related Eye Disease 
Study (AREDS) 

Outcome 

Age-related macular degeneration 
(AMD) 


We now illustrate the analysis of recurrent event 
survival data using a new example. We consider a 
subset of data from the Age-Related Eye Disease 
Study (AREDS), a long-term multicenter, prospec¬ 
tive study sponsored by the U.S. National Eye In¬ 
stitute of the clinical course of age-related mac¬ 
ular degeneration (AMD) (see AREDS Research 
Group, 2003). 


Clinical trial 

Evaluate effect of treatment with 
high doses of antioxidants and zinc 
on progression of AMD 

n = 43 (subset of data analyzed 
here) 


In addition to collecting natural history data, 
AREDS included a clinical trial to evaluate the 
effect of high doses of antioxidants and zinc on 
the progression of AMD. The data subset we con¬ 
sider consists of 43 patients who experienced 
ocular events while followed for their baseline 
condition, macular degeneration. 
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Exposure 

The exposure variable of interest was treatment 

tx = 1 if treatment, 0 if placebo 

group (tx), which was coded as 1 for patients ran¬ 
domly allocated to an oral combination of antiox¬ 

8 years of follow-up 

idants, zinc, and vitamin C versus 0 for patients 
allocated to a placebo. Patients were followed for 
eight years. 

Two possible events 

Each patient could possibly experience two events. 
The first event was defined as the sudden decrease 

First event: visual acuity score <50 
(i.e., poor vision) 

in visual acuity score below 50 measured at sched¬ 
uled appointment times. Visual acuity score was 
defined as the number of letters read on a stan¬ 
dardized visual acuity chart at a distance of 4 me¬ 
ters, where the higher the score, the better the 
vision. 

Second event: clinically advanced 
severe stage of macular degenera¬ 
tion 

The second event was considered a successive 
stage of the first event and defined as a clinically 
advanced and severe stage of macular degenera¬ 
tion. Thus, the subject had to experience the first 
event before he or she could experience the second 
event. 

4 approaches for analyzing recur¬ 
rent event survival data carried out 
on macular degeneration data 

We now describe the results of using the four ap¬ 
proaches for analyzing recurrent event survival 
with these data. In each analysis, two covariates 
age and sex were controlled, so that each model 

Each model contains tx, age, and 
sex. 

contained the variables tx, age, and sex. 

CP model 

The counting process (CP) model is shown here at 

h(t,X) = h 0 (t)exp[|3 tx + yj age 

+ y 2 sex] 

the left together with both the no-interaction and 
interaction SC models used for the three stratified 
Cox (SC) approaches. 

No-interaction SC model 


h g (t,X) = h 0g (t)exp[|3 tx + y, age 
+ y 2 sex] 


where g = 1, 2 


Interaction SC model: 


h g (t,X) = h 0g (t)exp[|3 g tx + y lg age 
+y 2g sex] 



where g = 1, 2 
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Table 8.14. Comparison of 
Parameter Estimates and Robust 
Standard Errors for Treatment 
Variable (tx) Controlling for Age 
and Sex (Macular Degeneration 
Data) 



"Interaction” Cox 
stratified model 

"No¬ 
interaction’’ 
SC model 

Model 

Stratum 1 

Stratum 2 



Pi 

p2 

P 


(SE) 

(SE) 

(SE) 

Counting 

n/a 

n/a 

-0.174 

process 



(0.104) 

Cond 1 

-0.055 

-0.955 

-0.306 


(0.286) 

(0.443) 

(0.253) 

Cond 2 

-0.055 

-1.185 

-0.339 


(0.286) 

(0.555) 

(0.245) 

Marginal 

-0.055 

-0.861 

-0.299 


(0.286) 

(0.465) 

(0.290) 


In Table 8.14, we compare the coefficient estimates 
and their robust standard errors for the treatment 
variable (tx) from all four approaches. This table 
shows results for both the “interaction” and “no¬ 
interaction” stratified Cox models for the three 
approaches other than the counting process ap¬ 
proach. 

Notice that the estimated coefficients for |3j and 
their corresponding standard errors are identical 
for the three SC approaches. This will always be 
the case for the first stratum regardless of the data 
set being considered. 

The estimated coefficients for (r> 2 are, as expected, 
somewhat different for the three SC approaches. 
We return to these results shortly. 


Interaction SC models are preferred LR tests for comparing the “no-interaction” with 
(based on LR test results) to use of the “interaction” SC models were significant (P < 
no-interaction SC model .0001) for all three SC approaches (details not 

shown), indicating that an interaction model was 
more appropriate than a no-interaction model for 
each approach. 
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Table 8.15. Comparison of Results 
for the Treatment Variable (tx) 
Obtained for Conditional 1 and 
Marginal Approaches (Macular 
Degeneration Data) 




Cond 1 

Marginal 

Estimate 

a 

-0.0555 

-0.0555 



-0.9551 

-0.8615 


p 

-0.306 

-0.2989 

Robust 

SE(|3i) 

0.2857 

0.2857 

std. 

SE(|3 2 ) 

0.4434 

0.4653 

error 

SE(|3) 

0.2534 

0.2902 

Wald 

H„:Pi =0 

0.0378 

0.0378 

chi- 

H 0 :P2 =0 

4.6395 

3.4281 

square 

H»:P =0 

1.4569 

1.0609 

P-value 

Ho: Pi = 0 

0.8458 

0.8478 


H 0 :P2 =0 

0.0312 

0.0641 


H 0 :P =0 

0.2274 

0.3030 

Hazard 

exp(pi) 

0.946 

0.946 

ratio 

exp(p 2 ) 

0.385 

0.423 


exp(p) 

0.736 

0.742 

95% Conf. 

exp(pi) 

(0.540, 1.656) 

(0.540, 1.656) 

interval 

exp(p 2 ) 

(0.161, 0.918) 

(0.170, 1.052) 


exp(p) 

(0.448, 1.210) 

(0.420, 1.310) 


In Table 8.15, we summarize the statistical infer¬ 
ence results for the effect of the treatment vari¬ 
able (tx) for the conditional 1 and marginal ap¬ 
proaches only. 

We have not included the CP results here because 
the two events being considered are of very dif¬ 
ferent types, particularly regarding severity of ill¬ 
ness, whereas the CP approach treats both events 
as identical replications. We have not considered 
the conditional 2 approach because the investi¬ 
gators were more likely interested in survival time 
from baseline entry into the study than the sur¬ 
vival time “gap” from the first to second event. 

Because we previously pointed out that the inter¬ 
action SC model was found to be significant when 
compared to the corresponding no-interaction SC 
model, we focus here on the treatment (tx) effect 
for each stratum (i.e., event) separately. 

Based on the Wald statistics and corresponding 
P-values for testing the effect of the treatment on 
survival to the first event (i.e., Ho: |3j =0), both the 
conditional 1 and marginal approaches give the 
identical result that the estimated treatment effect 
(HR = 0.946 = 1/1.06) is neither meaningful nor 
significant (P = 0.85). 

For the second event, indicating a clinically severe 
stage of macular degeneration, the Wald statistic 
P-value for the conditional 1 approach is 0.03, 
which is significant at the .05 level, whereas the 
corresponding P-value for the marginal approach 
is 0.06, borderline nonsignificant at the .05 level. 

Theestimated HR for the effect of the treatment 
is HR = 0.385 = 1/2.60 using the conditional 1 
approach and its 95% confidence interval is quite 
wide but does not contain the null value of 1. 
For the marginal approach, the estimated HR 
is HR = 0.423 = 1/2.36, also with a wide confi¬ 
dence interval, but includes 1. 
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Conclusions regarding 1st event: 

• No treatment effect 

• Same for conditional 1 and 
marginal approaches 


Conclusions regarding 2nd event: 

• Clinically moderate and 
statistically significant 
treatment effect 

• Similar for conditional 1 and 
marginal approaches, but more 
support from conditional 1 

approach 

Comparison of conditional 1 with 
marginal Approach 

What if results had been different? 


Recommend conditional 1 if 

Can assume 2nd event cannot occur 
without 1st event previously 
occurring 

Should consider survival time to 
2 nd event conditional on 

experiencing 1st event 


Thus, based on the above results, there appears to 
be no effect of treating patients with high doses 
of antioxidants and zinc on reducing visual acu¬ 
ity score below 50 (i.e., the first event) based on 
either conditional 1 or marginal approaches to 
the analysis. 

However, there is evidence of a clinically moder¬ 
ate and statistically significant effect of the treat¬ 
ment on protection (i.e., not failing) from the 
second more severe event of macular degenera¬ 
tion. This conclusion is more supported from the 
conditional 1 analysis than from the marginal 
analysis. 


Despite similar conclusions from both ap¬ 
proaches, it still remains to compare the two ap¬ 
proaches for these data. In fact, if the results from 
each approach had been very different, it would 
be important to make a choice between these ap¬ 
proaches. 

Nevertheless, we authors find it difficult to make 
such a decision, even for this example. The con¬ 
ditional 1 approach would seem appropriate if 
the investigators assumed that the second event 
cannot occur without the first event previously oc¬ 
curring. If so, it would be important to consider 
survival time to the second event only for (i.e., con¬ 
ditional on) those subjects who experience a first 
event. 


Recommend marginal if 

Can assume each subject at risk for 
2 nd event whether or not 1st 
event previously occurred 

i! 

2 nd event considered a separate 
event, that is, unconditional of 
the 1st event 


On the other hand, the marginal approach would 
seem appropriate if each subject is considered to 
be at risk for the second event whether or not 
the subject experiences the first event. The second 
event is therefore considered separate from (i.e., 
unconditional of) the first event, so that survival 
times to the second event need to be included for 
all subjects, as in the marginal approach. 


If 

Should consider survival times to 
2 nd event for all subjects 
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Macular degeneration data: recom¬ 
mend marginal approach 

In general: carefully consider inter¬ 
pretation of each approach 


For the macular degeneration data example, we 
find the marginal approach persuasive. However, 
in general, the choice among all four approaches 
is not often clear-cut and requires careful consid¬ 
eration of the different interpretations that can be 
drawn from each approach. 


XII. Survival Curves with 
Recurrent Events 


Coal: Plot and Interpret 
Survival Curves 

Types of survival curves: 

KM (empirical): Chapter 2 
Adjusted (Cox PH): Chapters 3 and 4 


An important goal of most survival analyses, 
whether or not a regression model (e.g., Cox PH) 
is involved, is to plot and interpret/compare sur¬ 
vival curves for different groups. We have previ¬ 
ously described the Kaplan-Meier (KM) approach 
for plotting empirical survival curves (Chapter 2) 
and we have also described how to obtain adjusted 
survival curves for Cox PH models (Chapters 3 
and 4). 


Previously: 1 (nonrecurrent) event 
Now: 

Survival plots with recurrent 
events? 


This previous discussion only considered survival 
data for the occurrence of one (nonrecurrent) 
event. So, how does one obtain survival plots when 
there are recurrent events? 


Focus on one ordered event at a time 

Si(t): 1st event 
S 2 OO : 2nd event 


The answer is that survival plots with recurrent 
events only make sense when the focus is on one 
ordered event at a time. That is, we can plot a sur¬ 
vival curve for survival to a first event, survival 
to a second event, and so on. 


Sk(t): kth event 


Survival to a 1st event 

Si(t) = Pr(T 1 > t) 
where 

T 1 = survival time up to 

occurrence of 1st event 
(ignores later recurrent events) 


For survival to a first event, the survival curve 
describes the probability that a subject’s time to 
occurrence of a first event will exceed a specified 
time. Such a plot essentially ignores any recurrent 
events that a subject may have after a first event. 
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Survival to a 2nd event 

S 2 (t) = Pr(T 2 > t) 
where 

T 2 = survival time up to 

occurrence of 2nd event 


For survival to a second event, the survival curve 
describes the probability that a subject’s time to 
occurrence of a second event will exceed a speci¬ 
fied time. 


Two versions 


There are two possible versions for such a plot. 


Conditional: 

T 2 c = time from 1st event to 2nd 
event, restricting data to 1st 
event subjects 

Marginal: 

T 2m = time from study entry to 2nd 
event, ignoring 1st event 


Conditional: use survival time from time of first 
event until occurrence of second event, thus 
restricting the dataset to only those subjects 
who experienced a first event. 

Marginal: use survival time from study entry to 
occurrence of second event, ignoring whether a 
first event occurred. 


Survival to a kth event (k > 2) 

S k (t) = Pr(T k > t) 
where 

T k = survival time up to 

occurrence of kth event 


Similarly, for survival to the kth event, the sur¬ 
vival curve describes the probability that a sub¬ 
ject's time to occurrence of the kth event will ex¬ 
ceed a specified time. 


Two versions As with survival to the second event, there are two 

possible versions, conditional or marginal, for 
Conditional: such a plot, as shown on the left. 

T kc = time from the k — 1st to kth 
event, restricting data to 
subjects with k — 1 events 
Marginal: 

T km = time from study entry to kth 
event, ignoring previous 
events 


EXAMPLE 




Days 


ID 

Status 

Stratum 

Start 

Stop 

tx 

M 

1 

1 

0 

too 

1 

M 

1 

2 

too 

105 

1 

H 

1 

1 

0 

30 

0 

H 

1 

2 

30 

50 

0 

P 

1 

1 

0 

20 

0 

P 

1 

2 

20 

60 

0 

P 

1 

3 

60 

85 

0 



We now illustrate such survival plots for recurrent 
event data by returning to the small dataset previ¬ 
ously described for three subjects Molly (M), Holly 
(H), and Polly (P), shown again on the left. 
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Deriving Si(t): Stratum 1 


tffl 

n i 

m i 

qj 

R(t©} 

Si(t(j)) 

0 

3 

0 

0 

{M, H. P} 

1.00 

20 

3 

1 

0 

{M, H. P} 

0.67 

30 

2 

1 

0 

{M, H} 

0.33 

100 

1 

1 

0 

{M} 

0.00 

Deriving S 2 C (t): Stratum 2 


(Conditional) 



t(j) 

n i 

m i 

q.i 

R(t©} 

S2c(t(j)) 

0 

3 

0 

0 

{M, H, P) 

1.00 

5 

3 

1 

0 

{M, H, P} 

0.67 

20 

2 

1 

0 

{M, P} 

0.33 

450 

1 

1 

0 

{M} 

0.00 

Deriving S 2 m (t): Stratum 2 


(Marginal) 




t© 

n i 

m i 

qj 

R(t©} 

S2m(t(j)) 

0 

3 

0 

0 

{M, H, P) 

1.00 

20 

3 

1 

0 

{M, H, P) 

0.67 

30 

2 

1 

0 

(H. P) 

0.33 

100 

1 

1 

0 

{P} 

0.00 


The survival plot for survival to the first event Si (t) 
is derived from the stratum 1 data layout for any 
of the three alternative SC analysis approaches. 
Recall that mj and q, denote the number of fail¬ 
ures and censored observations at time ty). The 
survival probabilities in the last column use the 
KM product limit formula. 


The conditional survival plot for survival to the 
second event is derived from the stratum 2 data 
layout for the conditional 2 approach. We denote 
this survival curve as S 2 C (t). Notice that the sur¬ 
vival probabilities here are identical to those in 
the previous table; however, the failure times t(j) 
in each table are different. 


The marginal survival plot for survival to the sec¬ 
ond event is derived from the stratum 2 data layout 
for the marginal approach. We denote this sur¬ 
vival curve as S 2 m (t). Again, the last column here 
is identical to those in the previous two tables, but, 
once again, the failure times t® in each table are 
different. 


Survival Plots for Molly, Holly 
and Polly Recurrent Event 
Data (n = 3) 


1.0 -r 
.8 - 


The survival plots that correspond to the above 
three data layouts are shown in Figures 8.1 to 8.3. 


Figure 8.1 shows survival probabilities for the first 
event, ignoring later events. The risk set at time 

- zero contains all three subjects. The plot drops 

from Si(t) = 1 to Si(t) = 0.67 at t = 20, drops 

| _again to Si(t) = 0.33 at t = 30 and falls to Si(t) = 

20 40 60 80 ioo 0 at t = 100 when the latest first event occurs. 


Figure 8.1. Si (t): Survival to 1st Event 
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Figure 8.2. S 2 C (t): Survival to 2nd 
Event (Conditional) 


Figure 8.2 shows conditional survival probabili¬ 
ties for the second event using survival time from 
the first event to the second event. Because all 
three subjects had a first event, the risk set at time 
zero once again contains all three subjects. Also, 
the survival probabilities of 1,0.67, 0.33, and 0 are 
the same as in Figure 8.1. Nevertheless, this plot 
differs from the previous plot because the survival 
probabilities are plotted at different survival times 
(t = 5, 20, 40 in Figure 8.2 instead of t = 20, 30, 
100 in Figure 8.1) 


10 1 - 

.8 - 

.6 - 
.4 - 
.2 - 

20 40 60 80 100 


Figure 8.3 shows marginal survival probabilities 
for the second event using survival time from 
study entry to the second event, ignoring the 
first event. The survival probabilities of 1, 0.67, 
0.33, and 0 are once again the same as in Figures 
8.1 and 8.2. Nevertheless, this plot differs from the 
previous two plots because the survival probabili¬ 
ties are plotted at different survival times (t = 50, 
60, 105 in Figure 8.3). 


Figure 8.3. S 2 m (t): Survival to 2nd 
Event (Marginal) 


XIII. Summary 


4 approaches for recurrent event 
data 

Counting process (CP), 
conditional 1, conditional 2 
marginal 

The 4 approaches 

• Differ in how risk set is 
determined 

• Differ in data layout 

• All involve standard Cox model 
program 

• Latter 3 approaches use a SC 
model 


We have described four approaches for analyzing 
recurrent event survival data. 

These approaches differ in how the risk set is deter¬ 
mined and in data layout. All four approaches in¬ 
volve using a standard computer program that fits 
a Cox PL! model, with the latter three approaches 
requiring a stratified Cox model, stratified by the 
different events that occur. 


Identical recurrent events 

ff 

CP approach 


The approach to analysis typically used when re¬ 
current events are treated as identical is called the 

CP Approach. 
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Recurrent events: different disease 
categories or event order important 
11 

Stratified Cox (SC) approaches 

CP approach: Start and Stop times 

Standard layout: only Stop (sur¬ 
vival) times (no recurrent events) 


Conditional 1: same Start and Stop 

Times as CP, but 
uses SC model 


Conditional 2: Start and Stop 

Times 

Start = 0 always 
Stop = time since 
previous 
event 
SC model 

Marginal approach: 

Standard layout (nonrecurrent 
event), that is, without (Start, 
Stop) columns 

Each failure is a separate process 

Recommend using robust estima¬ 
tion to adjust for correlation of ob¬ 
servations on the same subject. 


Application 1: Bladder Cancer 
study 
n = 86 

64 months of follow¬ 
up 


When recurrent events involve different disease 
categories and/or the order of events is considered 
important, the analysis requires choosing among 
the three alternative SC approaches. 

The data layout for the counting process approach 
requires each subject to have a line of data for each 
recurrent event and lists the start time and stop 
time of the interval of follow-up. This contrasts 
with the standard layout for data with no recurrent 
events, which lists only the stop (survival) time on 
a single line of data for each subject. 

The conditional 1 approach uses the exact same 
(start, stop) data layout format used for the CP ap¬ 
proach, except that for conditional 1 , the model 
used is a SC PH model rather than an unstratified 
PH model. 

The conditional 2 approach also uses a (start, 
stop) data layout, but the start value is always 0 
and the stop value is the time interval length since 
the previous event. The model here is also a SC 
model. 


The marginal approach uses the standard (non¬ 
recurrent event) data layout instead of the (start, 
stop) layout. The basic idea behind the marginal 
approach is that it allows each failure to be con¬ 
sidered as a separate process. 


For each of the SC alternative approaches, as for 
the CP approach, it is recommended to use ro¬ 
bust estimation to adjust the variances of the es¬ 
timated regression coefficients for the correlation 
of observations on the same subject. 

We considered two applications of the different 
approaches described above. First, we compared 
results from using all four methods to analyze 
data from a study of bladder cancer involving 86 
patients, each followed for a variable time up to 
64 months. 
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Repeated event: recurrence of 
bladder cancer 
tumor; up to 
4 events 


tx = 1 if thiotepa, 0 if placebo 
num = initial # of tumors 
size = initial size of tumors 


CP results: no strong evidence for tx 
(HR = 0.67, P = .09, 95% Cl: 0.414, 
1.069) 


Alternative parametric approach 

• Weibull PH model 

• Gamma shared frailty 
component 

• Bladder cancer dataset 

• Similar HR and confidence 
interval as for counting process 
approach 

Application 2: Clinical trial 


n = 43 

8 years of follow-up 

High doses of antioxidants and zinc 

Age-related macular degeneration 

Exposure: tx = 1 if treatment, 0 if 
placebo 

Covariates: age, sex 

Two possible events: 

1st event: visual acuity score <50 
(i.e., poor vision) 

2nd event: clinically advanced 
severe stage of macular 
degeneration 


The repeated event analyzed was the recurrence of 
a bladder cancer tumor after transurethral surgi¬ 
cal excision. Each recurrence of new tumors was 
treated by removal at each examination. About 
25% of the 86 subjects experienced four events. 

The exposure variable of interest was drug treat¬ 
ment status (tx, 0 = placebo, 1 = treatment with 
thiotepa), There were two covariates: initial num¬ 
ber of tumors (num) and initial size of tumors 

(size). 

Results for the CP approach, which was consid¬ 
ered appropriate for these data, indicated that 
there was no strong evidence that tx is effective 
after controlling for num and size (HR = 0.67, 
P = .09, 95% Cl: 0.414, 1.069) 

An alternative approach for analyzing recurrent 
event data was also described using a parametric 
model containing a frailty component (see Chap¬ 
ter 7). Specifically, a Weibull PH model with a 
gamma distributed frailty was fit using the blad¬ 
der cancer dataset. The resulting estimated HR 
and confidence interval were quite similar to the 
counting process results. 


The second application considered a subset of data 
(n = 43) from a clinical trial to evaluate the ef¬ 
fect of high doses of antioxidants and zinc on the 
progression of age-related macular degeneration 
(AMD). Patients were followed for eight years. 


The exposure variable of interest was treatment 
group (tx). Covariates considered were age and 

sex. 

Each patient could possibly experience two events. 
The first event was defined as the sudden decrease 
in visual acuity score below 50. The second event 
was considered a successive stage of the first event 
and defined as a clinically advanced and severe 
stage of macular degeneration. 
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Focus on conditional 1 vs. 

marginal (events were of different 
types) 


Because the two events were of very different types 
and because survival from baseline was of primary 
interest, we focused on the results for the condi¬ 
tional 1 and marginal approaches only. 


Interaction SC model / 
No-interaction SC model X 


Conclusions regarding 1st event 

• No treatment effect 

• Same for conditional 1 and 
marginal approaches 


An interaction SC model was more appropriate 
than a no-interaction model for each approach, 
thus requiring separate results for the two events 
under study. 

The results for the first event indicated no effect 
of the treatment on reducing visual acuity score 
below 50 (i.e., the first event) from either condi¬ 
tional 1 or marginal approaches to the analysis. 


Conclusions regarding 2nd event 

• Clinically moderate and 
statistically significant 
treatment effect 


However, there was evidence of a clinically moder¬ 
ate and statistically significant effect of the treat¬ 
ment on the second more severe event of macular 
degeneration. 


Macular degeneration data: prefer 
marginal approach (but not clear- 
cut) 


The choice between the conditional 1 and 
marginal approaches for these data was not clear- 
cut, although the Marginal approach was perhaps 
more appropriate because the two events were of 
very different types. 


In general: carefully consider inter¬ 
pretation of each approach 


In general, however, the choice among all four ap¬ 
proaches requires careful consideration of the dif¬ 
ferent interpretations that can be drawn from each 
approach. 


Survival plots: one ordered event at 
a time Two versions for survival to 
kth event: 

Conditional: only subjects with k — 
1 events 

Marginal: ignores previous events 


Survival plots with recurrent events are derived 
one ordered event at a time. For plotting survival 
to a kth event where k > 2, one can use either 
a conditional or marginal plot, which typically 
differ. 
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Detailed 

Outline 


I. Overview (page 334) 

A. Focus: outcome events that may occur more than 
once over the follow-up time for a given subject, 
that is, “recurrent events.” 

B. Counting Process (CP) approach uses the Cox PH 
model. 

C. Alternative approaches that use a Stratified Cox 
(SC) PH model. 

II. Examples of Recurrent Event Data (pages 334-336) 

A. 1. Multiple relapses from remission: leukemia 

patients. 

2. Repeated heart attacks: coronary patients. 

3. Recurrence of tumors: bladder cancer patients. 

4. Deteriorating episodes of visual acuity: macular 
degeneration patients. 

B. Objective of each example: to assess relationship 
of predictors to rate of occurrence, allowing for 
multiple events per subject. 

C. Different analysis required depending on whether: 

1. Recurrent events are treated as identical 
(counting process approach), or 

2. Recurrent events involve different disease 
categories and/or the order of events is 
important (stratified Cox approaches). 

III. Counting Process Example (pages 336-337) 

A. Data on two hypothetical subjects from a 
randomized trial that compares two treatments for 
bladder cancer tumors. 

B. Data set-up for Counting Process (CP) approach: 

1. Each subject contributes a line of data for each 
time interval corresponding to each recurrent 
event and any additional event-free follow-up 
interval. 

2. Each line of data for a given subject lists the 
start time and stop time for each interval of 
follow-up. 
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IV. General Data Layout for Counting Process 
Approach (pages 338-339) 

A. r'j time intervals for subject i. 

5ij event status (0 or 1) for subject i in interval j. 
bjo start time for subject i in interval j. 

% stop time for subject i in interval j. 

Xijk value of kth predictor for subject i in 
interval j. 


i = 1, 2,..., N; j 
Layout for subject 

= i ( : 
i: 

’,..., np 

k= 1 

i 

j 6 ij 

hjo 

kji 

Xiji - - 

• Xijp 

i 

1 S ;1 

hio 

bn 

Xm. 

• X ilp 

i 

2 6 i2 

ti20 

ti21 

Xl21 ■ 

. X i2p 

i 

fi S iri 

tirjO 

tiril 

X iril .. 

. x iriP 


C. Bladder Cancer Study example: 

1. Data layout provided for the first 26 subjects 
(86 subjects total) from a 64-month study of 
recurrent bladder cancer tumors. 

2. The exposure variable: drug treatment status 
(tx, 0 = placebo, 1 = treatment with thiotepa). 

3. Covariates: initial number of tumors (num) and 
initial size of tumors (size). 

4. Up to 4 events per subject. 

V. The Counting Process Model and Method 
(pages 340-344) 

A. The model typically used to carry out the 
Counting Process (CP) approach is the standard 
Cox PH model: h(t, X) = ho(t) cxp| X (3,Xj]. 

B. For recurrent event survival data, the (partial) 
likelihood function is formed differently than for 
nonrecurrent event survival data: 

1. A subject who continues to be followed after 
having failed at t(j) does not drop out of the risk 
set after ty) and remains in the risk set until his 
or her last interval of follow-up, after which the 
subject is removed from the risk set. 

2. Different lines of data contributed by the same 
subject are treated in the analysis as if they were 
independent contributions from different 
subjects. 
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C. For the bladder cancer data, the Cox PH Model for 
CP approach is given by 

h(t, X) = ho(t)exp[|3 tx + num + y 2 size]. 

D. The overall partial likelihood L from using the CP 
approach will be automatically determined by the 
computer program used once the data layout is in 
the correct CP form and the program code used 
involves the (start, stop) formulation. 

VI. Robust Estimation (pages 344-346) 

A. In the CP approach, the different intervals 
contributed by a given subject represent correlated 
observations on the same subject that must be 
accounted for in the analysis. 

B. A widely used technique for adjusting for the 
correlation among outcomes on the same subject 
is called robust estimation. 

C. The goal of robust estimation for the CP 
approach is to obtain variance estimators that 
adjust for correlation within subjects when 
previously no such correlation was assumed. 

D. The robust estimator of the variance of an 
estimated regression coefficient allows tests of 
hypotheses and confidence interval estimation 
about model parameters to account for correlation 
within subjects. 

E. The general form of the robust estimator can be 
most conveniently written in matrix notation; this 
formula is incorporated into the computer 
program and is automatically calculated by the 
program with appropriate coding. 

VII. Results for CP Example (pages 346-347) 

A. Edited output is provided from fitting the 
no-interaction Cox PH model involving the three 
predictors tx, num, and size. 

B. A likelihood ratio chunk test for interaction terms 
tx x num and tx x size was nonsignificant. 

C. The PH assumption was assumed satisfied for all 
three variables. 

D. The robust estimator of 0.2418 for the standard 
deviation of tx was similar though somewhat 
different from the corresponding nonrobust 
estimator of 0.2001. 

E. There was not strong evidence that tx is effective 
after controlling for num and size (HR = 0.67, 
two-sided P = .09, 95% Cl: 0.414, 1.069). 
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F. However, for a one-sided alternative, the P-values 
using both robust and nonrobust standard errors 
were significant at the .05 level. 

G. The 95% confidence interval using the robust 
variance estimator is quite wide. 

VIII. Other Approaches—Stratified Cox (pages 347-353) 

A. The “strata” variable for each of the three SC 
approaches treats the time interval number for 
each event occurring on a given subject as a 
stratified variable. 

B. Three alternative approaches involving SC models 
need to be considered if the investigator wants to 
distinguish the order in which recurrent events 
occur. 

C. These approaches all differ from what is called 
competing risk survival analysis in that the latter 
allows each subject to experience only one of 
several different types of events over follow-up. 

D. Conditional 1 approach: 

1. Same Start and Stop Times as CP. 

2. SC model. 

E. Conditional 2 approach: 

1. Start and Stop Times, but Start = 0 always and 
Stop = time since previous event. 

2. SC model. 

F. Marginal approach: 

1. Uses standard layout (nonrecurrent event); no 
(Start, Stop) columns. 

2. Treats each failure is a separate process. 

3. Each subject at risk for all failures that might 
occur, so that # actual failures < # possible 
failures. 

4. SC model. 

G. Must decide between two types of SC models: 

1. No-interaction SC versus interaction SC. 

2. Bladder cancer example: 

No-interaction model: h g (t,X) = 
ho g (t)exp[|3 tx + y, num + y 2 size], where g = 
1,2,3, 4. 

Interaction model: h g (t,X) = h 0g (t)exp[|3 g tx 
+ y ls num + y 2e sizej. where g = 1, 2, 3, 4. 

H. Recommend using robust estimation to adjust for 
correlation of observations on the same subject. 
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IX. Bladder Cancer Study Example (Continued) 

(pages 353-357) 

A. Results from using all four methods—CP, 
conditional 1, conditional 2, and marginal—on 

the bladder cancer data were compared. 

B. The hazard ratio for the effect of tx based on a 
no-interaction model differed somewhat for each 
of the four approaches, with the marginal model 
being most different: 

M: 0.560 CP: 0.666 Cl: 0.716 C2: 0.763 

C. The nonrobust and robust standard errors and 
P-values differed to some extent for each of the 
different approaches. 

D. Using an interaction SC model, the estimated (3s 
and corresponding standard errors are different 
over the four strata (i.e., four events) for each 
model separately. 

E. The estimated (3 s and corresponding standard 
errors for the three alternative SC models are 
identical, as expected (always for first events). 

F. Which of the four recurrent event analysis 
approaches is best? 

1. Recommend CP approach if do not want to 
distinguish between recurrent events on the 
same subject and desire overall conclusion 
about the effect of tx. 

2. Recommend one of the three SC approaches if 
want to distinguish the effect of tx according to 
the order in which the event occurs. 

3. The choice between the conditional 1 and 
marginal is difficult, but prefer conditional 1 
because the strata do not clearly represent 
different event types. 

G. Overall, regardless of the approach used, there was 
no strong evidence that tx is effective after 
controlling for num and size. 

X. A Parametric Approach Using Shared Frailty 
(pages 357-359) 

A. Alternative approach using a parametric model 
containing a frailty component (see Chapter 7). 

B. Weibull PH model with a gamma distributed 
frailty was fit using the bladder cancer dataset. 

C. Estimated HR and confidence interval were quite 
similar to the counting process results. 

D. Estimated frailty component was significant 
(P = 0.003). 
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XI. A Second Example (pages 359-364) 

A. Clinical trial (n = 43, 8-year study) on effect of 
using high doses of antioxidants and zinc (i.e., 
tx = 1 if yes, 0 if no) to prevent age-related 
macular degeneration. 

B. Covariates: age and sex. 

C. Two possible events: 

1. First event: visual acuity score <50 (i.e., poor 
vision). 

2. Second event: clinically advanced stage of 
macular degeneration. 

D. Focus on conditional 1 vs. marginal because 
events are of different types. 

E. Interaction SC model significant when compared 
to no-interaction SC model. 

F. Conclusions regarding 1st event: 

1. No treatment effect (HR = 0.946, P = 0.85). 

2. Same for conditional 1 and marginal 
approaches. 

G. Conclusions regarding 2nd event: 

1. Conditional 1: HR = .385 = 1/2.60, two-sided 
P = .03. 

2. Marginal: HR = .423 = 1/2.36, two-sided 
P = .06). 

3. Overall, clinically moderate and statistically 
significant treatment effect. 

H. Marginal approach preferred because 1st and 2nd 
events are different types. 

XII. Survival Curves with Recurrent Events 

(pages 364-367) 

A. Survival plots with recurrent events only make 
sense when the focus is on one ordered event at a 
time. 

B. For survival from a 1st event, the survival curve is 
given by Sfft) = Pr (T i > t) where T i = survival 
time up to occurrence of the 1st event (ignores 
later recurrent events). 

C. For survival from the kth event, the survival curve 
is given by S k (t) = Pr (T k > t) where T k = survival 
time up to occurrence of the kth event). 
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Practice 

Exercises 


D. Two versions for Sk(t): 

i. S kc (t) conditional: Tk c = time from the k — 1st 
to kth event, restricting data to subjects with 

k — 1 events. 

ii. Skm(t) marginal: Tkm = time from study entry 
to kth event, ignoring previous events. 

E. Illustration of survival plots for recurrent event 
data using a small dataset involving three subjects 
Molly (M), Holly (H), and Polly (P). 

XIII. Summary (pages 367-370) 

A. Four approaches for analyzing recurrent event 
survival data: the counting process (CP), 
conditional 1, conditional 2, and marginal 
approaches. 

B. Data layouts differ for each approach. 

C. CP approach uses Cox PH model; other 
approaches use Cox SC model. 

D. Choice of approach depends in general on carefully 
considering the interpretation of each approach. 

E. Should use robust estimation to adjust for 
correlation of observations on the same subject. 


Answer questions 1 to 15 as true or false (circle T or F). 

T F 1. A recurrent event is an event (i.e., failure) that 
can occur more than once over the follow-up on 
a given subject. 

T F 2. The Counting Process (CP) approach is appro¬ 
priate if a given subject can experience more than 
one different type of event over follow-up. 

T F 3. In the data layout for the CP approach, a subject 
who has additional follow-up time after having 
failed at time ty) does not drop out of the risk set 
after ty). 

T F 4. The CP approach requires the use of a stratified 
Cox (SC) PH model. 

T F 5. Using the CP approach, if exactly two subjects fail 
at month ty) = 10, but both these subjects have 
later recurrent events, then the number in the risk 
set at the next ordered failure time does not de¬ 
crease because of these two failures. 
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T F 6. The goal of robust estimation for the CP 

approach is to adjust estimated regression coef¬ 
ficients to account for the correlation of obser¬ 
vations within subjects when previously no such 
correlation was assumed. 

T F 7. Robust estimation is recommended for the CP 
approach but not for the alternative SC ap¬ 
proaches for analyzing recurrent event survival 
data. 

T F 8. The P-value obtained from using a robust stan¬ 
dard error will always be larger than the corre¬ 
sponding P-value from using a nonrobust stan¬ 
dard error. 

T F 9. The marginal approach uses the exact same 
(start, stop) data layout format used for the CP 
approach, except that for the marginal approach, 
the model used is a stratified Cox PF1 model 
variable rather than a standard (unstratified) PIT 
model. 

T F 10. Supppose the maximum number of failures oc¬ 
curring for a given subject is five in a dataset to 
be analyzed using the marginal approach. Then a 
subject who failed only twice will contribute five 
lines of data corresponding to his or her two fail¬ 
ures and the three additional failures that could 
have possibly occurred for this subject. 

T F 11. Suppose the maximum number of failures occur¬ 
ring for a given subject is five in a dataset to 
be analyzed using the conditional 1 approach. 
Then an interaction SC model used to carry 
out this analysis will have the following gen¬ 
eral model form: h g (t, X) = ho g (t)exp[|3 lg Xi + 
|3 2g X2 + • • • + |3 pg Xp], g = 1,2, 3, 4,5. 

T F 12. Suppose a no-interaction SC model using the con¬ 
ditional 1 approach is found (using a likelihood 
ratio test) not statistically different from a corre¬ 
sponding interaction SC model. Then if the no¬ 
interaction model is used, it will not be possible 
to separate out the effects of predictors within 
each stratum representing the recurring events on 
a given subject. 

T F 13. In choosing between the conditional 1 and the 
marginal approaches, the marginal approach 
would be preferred provided the different strata 
clearly represent different event types. 
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T F 14. When using an interaction SC model to analyze 
recurrent event data, the estimated regression 
coefficients and corresponding standard errors 
for the first stratum always will be identical for 

the conditional 1, conditional 2, and marginal 

approaches. 

T F 15. The choice among the CP, conditional 1, condi¬ 
tional 2, and marginal approaches depends upon 
whether a no-interaction SC or an interaction SC 
model is more appropriate for one’s data. 

16. Suppose that Allie (A), Sally (S), and Callie (C) are the 
only three subjects in the dataset shown below. All three 
subjects have two recurrent events that occur at different 
times. 


ID 

Status 

Stratum 

Start 

Stop 

tx 

A 

1 

1 

0 

70 

1 

A 

1 

2 

70 

90 

1 

S 

1 

1 

0 

20 

0 

S 

1 

2 

20 

30 

0 

C 

1 

1 

0 

10 

1 

C 

1 

2 

10 

40 

1 


Fill in the following data layout describing survival 
(in weeks) to the first event (stratum 1). Recall that 
m, and q, denote the number of failures and censored 
observations at time t®. The survival probabilities in the 
last column use the KM product limit formula. 

t(j) nj mj qj R(t 0) ) Stftg)) 

0 3 0 0 {A, S, C} 1.00 

10 _ _ . 


17. Plot the survival curve that corresponds to the data layout 
obtained for Question 16. 

l.o 
.8 - 
.6 - 
.4 - 
.2 - 


20 40 60 80 100 
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18. Fill in the following data layout describing survival 
(in weeks) from the first to second event using the 
conditional approach: 

t(j) nj mj qj R(t q) ) Si(t q) ) 

0 3 0 0 {A, S, C} 1.00 

10 _ _ . 


19. Plot the survival curve that corresponds to the data layout 
obtained for Question 18. 

1.0 -| 

.8 - 
.6 - 
.4 - 
.2 - 


20 40 60 80 100 

20. Fill in the following data layout describing survival 
(in weeks) to the second event using the marginal 

approach: 

t 0 ) nj mj qj R(t q) ) S^tq)) 

0 3 0 0 {A, S, C} 1.00 

30 _ _ . 


21. Plot the survival curve that corresponds to the data layout 
obtained for Question 20. 

1.0 -| 

.8 - 
.6 - 
.4 - 
.2 - 


20 40 60 80 100 

22. To what extent do the three plots obtained in Questions 
17, 19, and 21 differ? Explain briefly. 
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Test 


The dataset shown below in the counting process layout 
comes from a clinical trial involving 36 heart attack patients 
between 40 and 50 years of age with implanted defibrillators 
who were randomized to one of two treatment groups 
(tx, = 1 if treatment A, = 0 if treatment B) to reduce their 
risk for future heart attacks over a four-month period. The 
event of interest was experiencing a “high energy shock” 
from the defibrillator. The outcome is time (in days) until an 
event occurs. The covariate of interest was Smoking History 
(1 = ever smoked, 0 = never smoked). Questions about the 
analysis of this dataset follow. 


Col 1 = id, Col 2 = event, Col 3 = start, Col 4 = stop, 
Col 5 = tx, Col 6 = smoking 


01 

1 

0 

39 

0 

0 

12 

1 

0 

39 

0 

1 

01 

1 

39 

66 

0 

0 

12 

1 

39 

80 

0 

1 

01 

1 

66 

97 

0 

0 

12 

0 

80 

107 

0 

1 

02 

1 

0 

34 

0 

1 

13 

1 

0 

36 

0 

1 

02 

1 

34 

65 

0 

1 

13 

1 

36 

64 

0 

1 

02 

1 

65 

100 

0 

1 

13 

1 

64 

95 

0 

1 

03 

1 

0 

36 

0 

0 

14 

1 

0 

46 

0 

1 

03 

1 

36 

67 

0 

0 

14 

1 

46 

77 

0 

1 

03 

1 

67 

96 

0 

0 

14 

0 

77 

111 

0 

1 

04 

1 

0 

40 

0 

0 

15 

1 

0 

61 

0 

1 

04 

1 

40 

80 

0 

0 

15 

1 

61 

79 

0 

1 

04 

0 

80 

111 

0 

0 

15 

0 

79 

111 

0 

1 

05 

1 

0 

45 

0 

0 

16 

1 

0 

57 

0 

1 

05 

1 

45 

68 

0 

0 

16 

0 

57 

79 

0 

1 

05 


68 


0 

0 

16 


79 


0 

1 

06 

1 

0 

33 

0 

1 

17 

1 

0 

37 

0 

1 

06 

1 

33 

66 

0 

1 

17 

1 

37 

76 

0 

1 

06 

1 

66 

96 

0 

1 

17 

0 

76 

113 

0 

1 

07 

1 

0 

34 

0 

1 

18 

1 

0 

58 

0 

1 

07 

1 

34 

67 

0 

1 

18 

1 

58 

67 

0 

1 

07 

1 

67 

93 

0 

1 

18 

0 

67 

109 

0 

1 

08 

1 

0 

39 

0 

1 

19 

1 

0 

58 

1 

1 

08 

1 

39 

72 

0 

1 

19 

1 

58 

63 

1 

1 

08 

1 

72 

102 

0 

1 

19 

1 

63 

106 

1 

1 

09 

1 

0 

39 

0 

1 

20 

1 

0 

45 

1 

0 

09 

1 

39 

79 

0 

1 

20 

1 

45 

72 

1 

0 

09 

0 

79 

109 

0 

1 

20 

1 

72 

106 

1 

0 

10 

1 

0 

36 

0 

0 

21 

1 

0 

48 

1 

0 

10 

1 

36 

65 

0 

0 

21 

1 

48 

81 

1 

0 

10 

1 

65 

96 

0 

0 

21 

1 

81 

112 

1 

0 

11 

1 

0 

39 

0 

0 

22 

1 

0 

38 

1 

1 

11 

1 

39 

78 

0 

0 

22 

1 

38 

64 

1 

1 

11 

1 

78 

108 

0 

0 

22 

1 

64 

97 

1 

1 
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23 

1 

0 

51 

1 

1 

30 

1 

0 

57 

1 

0 

23 

1 

51 

69 

1 

1 

30 

1 

57 

78 

1 

0 

23 

0 

69 

98 

1 

1 

30 

1 

78 

99 

1 

0 

24 

1 

0 

43 

1 

1 

31 

1 

0 

44 

1 

1 

24 

1 

43 

67 

1 

1 

31 

1 

44 

74 

1 

1 

24 

0 

67 

111 

1 

1 

31 

1 

74 

96 

1 

1 

25 

1 

0 

46 

1 

0 

32 

1 

0 

38 

1 

1 

25 

1 

46 

66 

1 

0 

32 

1 

38 

71 

1 

1 

25 

1 

66 

110 

1 

0 

32 

1 

71 

105 

1 

1 

26 

1 

0 

33 

1 

1 

33 

1 

0 

38 

1 

1 

26 

1 

33 

68 

1 

1 

33 

1 

38 

64 

1 

1 

26 

1 

68 

96 

1 

1 

33 

1 

64 

97 

1 

1 

27 

1 

0 

51 

1 

1 

34 

1 

0 

38 

1 

1 

27 

1 

51 

97 

1 

1 

34 

1 

38 

63 

1 

1 

27 

0 

97 

115 

1 

1 

34 

1 

63 

99 

1 

1 

28 

1 

0 

37 

1 

0 

35 

1 

0 

49 

1 

1 

28 

1 

37 

79 

1 

0 

35 

1 

49 

70 

1 

1 

28 

1 

79 

93 

1 

0 

35 

0 

70 

107 

1 

1 

29 

1 

0 

41 

1 

1 

36 

1 

0 

34 

1 

1 

29 

1 

41 

73 

1 

1 

36 

1 

34 

81 

1 

1 

29 

0 

73 

111 

1 

1 

36 

1 

81 

97 

1 

1 


Table T. 1 below provides the results for the treatment variable 
(tx) from no-interaction models over all four recurrent event 
analysis approaches. Each model was fit using either a Cox 
PH model (CP approach) or a Stratified Cox (SC) PH model 

(conditional 1, conditional 2, marginal approaches) that 
controlled for the covariate smoking. 


Table T.l. Comparison of Results for the Treatment Variable (tx) Obtained 
from No-Interaction Models 3 Across Four Methods (Defibrillator Study) 


Model 

CP 

Conditional 1 

Conditional 2 

Marginal 

Parameter 

estimate 13 

0.0839 

0.0046 

-0.0018 

-0.0043 

Robust 

standard 

error 

0.1036 

0.2548 

0.1775 

0.2579 

Chi-square 

0.6555 

0.0003 

0.0001 

0.0003 

p-value 

0.4182 

0.9856 

0.9918 

0.9866 

Hazard 

ratio 

1.087 

1.005 

0.998 

0.996 

95% confidence 
interval 

(0.888, 1.332) 

(0.610, 1.655) 

(0.705, 1.413) 

(0.601, 1.651) 


a No-interaction SC model fitted with PROC PHREG for the conditional 1, conditional 2 and 
marginal methods; no-interaction standard Cox PH model fitted for CP approach. 
b Estimated coefficient of tx variable. 
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1. State the hazard function formula for the no-interaction 
model used to fit the CP approach. 

2. Based on the CP approach, what do you conclude about the 
effect of treatment (tx)? Explain briefly using the results in 
Table T.l. 

3. State the hazard function formulas for the no-interaction 
and interaction SC models corresponding to the use of the 
marginal approach for fitting these data. 

4. Table T.l gives results for “no-interaction” SC models 
because likelihood ratio (LR) tests comparing a “no¬ 
interaction” with an “interaction” SC model were not sig¬ 
nificant. Describe the (LR) test used for the marginal model 
(full and reduced models, null hypothesis, test statistic, dis¬ 
tribution of test statistic under the null). 

5. How can you criticize the use of a no-interaction SC model 
for any of the SC approaches, despite the finding that the 
above likelihood ratio test was not significant? 

6. Based on the study description given earlier, why does it 
make sense to recommend the CP approach over the other 
alternative approaches? 

7. Under what circumstances/assumptions would you rec¬ 
ommend using the marginal approach instead of the CP 
approach? 

Table T.2 below provides ordered failure times and corre¬ 
sponding risk set information that result for the 36 subjects 

in the above Defibrillator Study dataset using the Counting 

Process (CP) data layout format. 


Table T.2. Ordered Failure Times and Risk Set Information 
for Defibrillator Study (CP) 


Ordered 
failure 
times t(j) 

# in 
risk 
set nj 

# failed 

m i 

# censored 
in 

[tffl’ttj+i)) 

Subject ID #s 
for outcomes 
in [t(j),t(j + p) 

0 

36 

0 

0 

— 

33 

36 

2 

0 

6, 26 

34 

36 

3 

0 

2, 7, 36 

36 

36 

3 

0 

3, 10, 13 

37 

36 

2 

0 

17, 28 

38 

36 

4 

0 

22, 32, 33, 34 

39 

36 

5 

0 

1, 8, 9, 11, 12 

40 

36 

1 

0 

4 

41 

36 

1 

0 

29 

43 

36 

1 

0 

24 

44 

36 

1 

0 

31 

( Continued ) 




384 8. 


Recurrent Event Survival Analysis 


Table T.2. ( Continued ) 


Ordered 
failure 
times t(j) 

# in 
risk 
set nj 

# failed 

# censored 
in 

t(j+o) 

Subject ID #s 
for outcomes 
in ft®. t(j+o) 

45 

36 

2 

0 

5,20 

46 

36 

2 

0 

14,25 

48 

36 

1 

0 

21 

49 

36 

1 

0 

35 

51 

36 

2 

0 

23,27 

57 

36 

2 

0 

16, 30 

58 

36 

2 

0 

18, 19 

61 

36 

1 

0 

15 

63 

36 

2 

0 

19, 34 

64 

36 

3 

0 

13, 22, 33 

65 

36 

2 

0 

2, 10 

66 

36 

3 

0 

1,6,25 

67 

36 

4 

0 

3,7, 18,24 

68 

36 

2 

0 

5,26 

69 

35 

1 

0 

23 

70 

35 

1 

0 

35 

71 

35 

1 

0 

32 

72 

35 

2 

0 

8,20 

73 

35 

1 

0 

29 

74 

35 

1 

0 

31 

76 

35 

1 

0 

17 

77 

35 

1 

0 

14 

78 

35 

2 

0 

11, 30 

79 

35 

3 

1 

9, 15, 16,28 

80 

34 

2 

0 

4, 12 

81 

34 

2 

0 

21,36 

93 

34 

2 

0 

7, 28 

95 

32 

1 

0 

13 

96 

31 

5 

0 

3,6, 10,26,31 

97 

26 

5 

0 

1,22,27, 33,36 

98 

22 

0 

1 

23 

99 

21 

2 

0 

30, 34 

100 

19 

1 

0 

2 

102 

18 

1 

0 

8 

105 

17 

1 

0 

32 

106 

16 

2 

0 

19, 20 

107 

14 

1 

1 

12, 35 

108 

12 

1 

0 

11 

109 

11 

0 

2 

9, 18 

110 

9 

1 

0 

25 

111 

8 

0 

5 

4, 14, 15,24, 29 

112 

3 

1 

0 

21 

113 

2 

0 

1 

17 

115 

1 

0 

1 

27 





Test 385 


8. In Table T.2, why does the number in the risk set (nj) re¬ 
main unchanged through failure time (i.e., day) 68, even 
though 50 events occur up to that time? 

9. Why does the number in the risk set change from 31 to 
26 when going from time 96 to 97? 

10. Why is the number of failures (m,) equal to 3 and the 
number of censored subjects equal to 1 in the interval 
between failure times 79 and 80? 

11. What 5 subjects were censored in the interval between 
failure times 111 and 112? 

12. Describe the event history for subject #5, including his or 
her effect on changes in the risk set. 

Based on the CP data layout of Table T.2, the following table 

(T.3) of survival probabilities has been calculated. 

Table T.3. Survival Probabilities for Defibrillator Study 

Data Based on CP Layout 


t© nj mj qj Sit©) = Sft^DjPrCT > t©|T > t©) 


0 36 0 

33 36 2 

34 36 3 

36 36 3 

37 36 2 

38 36 4 

39 36 5 

40 36 1 

41 36 1 

43 36 1 

44 36 1 

45 36 2 

46 36 2 

48 36 1 

49 36 1 

51 36 2 

57 36 2 

58 36 2 

61 36 1 

63 36 2 

64 36 3 

65 36 2 

66 36 3 

67 36 4 

68 36 2 

69 35 1 


0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 


1 x 34/36 = .94 
.94 x 33/36 = .87 
.87 x 33/36 = .79 
.79 x 34/36 = .75 
.75 x 32/36 = .67 
.67 x 31/36 = .57 
.57 x 35/36 = .56 
.56 x35/36 = .54 
.54 x 35/36 = .53 
.53 x 35/36 = .51 
.51 x 34/36 = .48 
.48 x 34/36 = .46 
.46 x 35/36 = .44 
.44x 35/36 = .43 
.43 x 34/36 = .41 
.41 x 34/36 = .39 
.39 x 34/36 = .36 
.36 x 35/36 = .35 
.35 x 34/36 = .33 
.33 x 33/36 = .31 
.31 x 34/36 = .29 
.29 x 33/36 = .27 
.27 x 32/36 = .24 
.24 x 34/36 = .22 
.22 x 34/35 = .22 


1.0 


( Continued ) 
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Table T.3. ( Continued ) 


hi) 

n i 

m j 

Oi 

S(t(j)) = Sfty.jjfPrCT > t(j)|T > t (j) ) 

70 

35 

1 

0 

.22 x 34/35 = .21 

71 

35 

1 

0 

.21 x 34/35 = .20 

72 

35 

2 

0 

.20 x 33/35 = .19 

73 

35 

1 

0 

.19 x 34/35 = .19 

74 

35 

1 

0 

.19 x 34/35 = .18 

76 

35 

1 

0 

.18 x 34/35 = .18 

77 

35 

1 

0 

.18 x 34/35 = .17 

78 

35 

2 

0 

.17 x 33/35 = .16 

79 

35 

3 

1 

.16 x 31/35 = .14 

80 

34 

2 

0 

.14 x 32/34 = .13 

81 

34 

2 

0 

.13 x 32/34 = .13 

95 

32 

1 

0 

.13 x 31/32 = .12 

96 

31 

5 

0 

.12 x 26/31 = .10 

97 

26 

5 

0 

.10 x 21/26 = .08 

98 

22 

0 

1 

.08 x 22/22 = .08 

99 

21 

2 

0 

.08 x 19/21 = .07 

100 

19 

1 

0 

.07 x 18/19 = .07 

102 

18 

1 

0 

.07 x 17/18 = .06 

105 

17 

1 

0 

.06 x 16/17 = .06 

106 

16 

2 

0 

.06 x 14/16 = .05 

107 

14 

1 

1 

.05 x 13/14 = .05 

108 

12 

1 

0 

.05 x 21/26 = .05 

109 

11 

0 

2 

.05 x 11/11 = .05 

110 

9 

1 

0 

.05 x 8/9 = .04 

111 

8 

0 

5 

.04 x 8/8 = .04 

112 

3 

1 

0 

.04 x 2/3 = .03 

113 

2 

0 

1 

.03 x 2/2 = .03 

115 

1 

0 

1 

.03 x 1/1 = .03 


13. Suppose the survival probabilities shown in Table T.3 are 
plotted on the y-axis versus corresponding ordered failure 
times on the x-axis. 

i. What is being plotted by such a curve? (Circle one or 

more choices.) 

a. Pr(T, > t) where T i = time to first event from study 
entry. 

b. Pr(T > t) where T = time from any event to the 
next recurrent event. 

c. Pr(T > t) where T = time to any event from study 
entry. 

d. Pr(not failing prior to time t). 

e. None of the above. 

ii. Can you criticize the use of the product limit formula 

for S(t(j)) in Table T.3? Explain briefly. 
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14. Use Table T.2 to complete the data layouts for plotting the 
following survival curves. 

a. S 1 (t) = Pr(T 1 > t) where Ti = time to first event from 
study entry 


tffl 

n i 

m i 

qj S(t a) ) = S(t a _, 

)) X Pr(Ti > t|Tj > t) 

0 

36 

0 

0 

1.00 

33 

36 

2 

0 

0.94 

34 

34 

3 

0 

0.86 

36 

31 

3 

0 

0.78 

37 

28 

2 

0 

0.72 

38 

26 

4 

0 

0.61 

39 

22 

5 

0 

0.47 

40 

17 

1 

0 

0.44 

41 

16 

1 

0 

0.42 

43 

15 

1 

0 

0.39 

44 

14 

1 

0 

0.36 

45 

13 

2 

0 

0.31 

46 

11 

2 

0 

0.25 

48 

9 

1 

0 

0.22 

49 

8 

1 

0 

0.19 

51 





57 





58 





61 

- 

- 

- 

- 


b. Conditional S 2 C (t) = Pr(T 2c > t) where T 2c = time to 
second event from first event. 


t(j) 

n i 

m i 

qj S(t(j)) = S(t a _, 

o) x Pr(T 

0 

36 

0 

0 

1.00 

5 

36 

1 

0 

0.97 

9 

35 

1 

0 

0.94 

18 

34 

2 

0 

0.89 

20 

32 

1 

0 

0.86 

21 

31 

2 

1 

0.81 

23 

28 

1 

0 

0.78 

24 

27 

1 

0 

0.75 

25 

26 

1 

0 

0.72 

26 

25 

2 

0 

0.66 

27 

23 

2 

0 

0.60 

28 

21 

1 

0 

0.58 

29 

20 

1 

0 

0.55 

30 

19 

1 

0 

0.52 


( Continued ) 
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( Continued ) 


t(i) 

n j 

m i 


S(t(j)) = S(t (j _ 1) ) X Pr(T i > t|Ti > t) 

31 

18 

3 

0 

0.43 

32 

15 

1 

0 

0.40 

33 

14 

5 

0 

0.26 

35 

9 

1 

0 

0.23 

39 

8 

2 

0 

0.17 

40 





41 





42 





46 





47 

- 

- 

- 

- 


c. Marginal S 2 m (t) = Pr(T 2 m > t) where T 2 ra = time to 
second event from study entry. 


tffl 

n j 

m i 

<Jj %)) = S(t(H 

0 ) X Pr(T i > t|Ti > t) 

0 

36 

0 

0 

1.00 

63 

36 

2 

0 

0.94 

64 

34 

3 

0 

0.86 

65 

31 

2 

0 

0.81 

66 

29 

3 

0 

0.72 

67 

26 

4 

0 

0.61 

68 

22 

2 

0 

0.56 

69 

20 

1 

0 

0.53 

70 

19 

1 

0 

0.50 

71 

18 

1 

0 

0.47 

72 

17 

2 

0 

0.42 

73 

15 

1 

0 

0.39 

74 

14 

1 

0 

0.36 

76 

13 

1 

0 

0.33 

77 

12 

1 

0 

0.31 

78 

11 

2 

0 

0.25 

79 
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81 
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- 

- 

- 

- 


15. The survival curves corresponding to each of the data lay¬ 
outs (a, b, c) described in Question 14 will be different. 
Why? 
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Answers to 

Practice 

Exercises 


1. T 

2. F: The marginal approach is appropriate if events are of 
different types. 

3. T 

4. F: The marginal, conditional 1, and conditional 2 ap¬ 
proaches all require a SC model, whereas the CP approach 
requires a standard PH model. 

5. T 

6. F: Robust estimation adjusts the standard errors of re¬ 
gression coefficients. 

7. F: Robust estimation is recommended for all four ap¬ 
proaches, not just the CP approach. 

8. F: The P-value from robust estimation may be either larger 
or smaller than the corresponding P-value from nonrobust 
estimation. 

9. F: Replace the word marginal with conditional 1 or con¬ 
ditional 2. The marginal approach does not use (Start, 
Stop) columns in its layout. 

10. T 

11. T 

12. T 

13. T 

14. T 

15. F: The choice among the CP, conditional 1, conditional 

2, and marginal approaches depends on carefully consid¬ 
ering the interpretation of each approach. 


t(j) 
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17. Sfft) 
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21. S 2m (t) Marginal 
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22. All three plots differ because the risk sets for each plot 
are defined differently inasmuch as the failure times are 
different for each plot. 
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Introduction 


This chapter considers survival data in which each subject 
can experience only one of several different types of events 
over follow-up. This situation contrasts with the topic of the 
preceding chapter in which subjects could experience more 
than one event of a given type. When only one of several dif¬ 
ferent types of event can occur, we refer to the probabilities 
of these events as “competing risks,” which explains the title 
of this chapter. 

Modeling competing risks survival data can be carried out 
using a Cox model, a parametric survival model, or models 
that use the cumulative incidence (rather than survival). In 
this chapter, we mainly consider the Cox model because of its 
wide popularity and also because of the availability of com¬ 
puter programs that use the Cox model for analysis of com¬ 
peting risks. 

The typical (“cause-specific”) approach for analyzing com¬ 
peting risks data is to perform a survival analysis for each 
event type separately, where the other (competing) event types 
are treated as censored categories. There are two primary 
drawbacks to the above method. One problem is that the 
above method requires the assumption that competing risks 
are independent. The second problem is that the generalized 
Kaplan-Meier (KM)-based product-limit survival curve ob¬ 
tained from fitting separate Cox models for each event type 
has questionable interpretation when there are competing 
risks. 

Unfortunately, if the independence assumption is incorrect, 
there is no direct methodology available for analyzing com¬ 
peting risks simultaneously. The only “indirect” method for 
addressing this problem involves carrying out a “sensitiv¬ 
ity analysis” that treats subjects with events from compet¬ 
ing risks as all being event-free or as all experiencing the 
event of interest. An example of this “sensitivity” approach is 
provided. 
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The primary alternative summary curve to the KM-based sur¬ 
vival curve is the “cumulative incidence curve (CIC),” which 
estimates the “marginal probability” of an event (both terms 
are defined in this chapter). This CIC is not estimated using 
a product-limit formulation, and its computation is not in¬ 
cluded in mainstream statistical packages. Moreover, the in¬ 
dependence of competing risks is still required when a propor¬ 
tional hazard model is used to obtain hazard ratio estimates 
for individual competing risks as an intermediate step in the 
computation of a CIC. Nevertheless, the CIC has a meaning¬ 
ful interpretation in terms of treatment utility regardless of 
whether competing risks are independent. A variation of the 
CIC, called the “conditional probability curve (CPC),” pro¬ 
vides a risk probability conditional on an individual not ex¬ 
periencing any of the other competing risks by time t. 

An equivalent approach to the cause-specific method for an¬ 
alyzing competing risks is called the Lunn-McNeil (LM) ap¬ 
proach. The LM approach allows only one model to be fit 
rather than separate models for each event type and, more¬ 
over, allows flexibility to perform statistical inferences about 
simpler versions of the LM model. This approach has added 
appeal in that competing events are not considered as sim¬ 
ply being censored. Nevertheless, as with the cause-specific 
approach, the LM method assumes the independence of com¬ 
peting risks. 
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Abbreviated 

Outline 


The outline below gives the user a preview of the material 
covered by the presentation. A detailed outline for review pur¬ 
poses follows the presentation. 

I. Overview (page 396) 

II. Examples of Competing Risks Data 
(pages 396-398) 

III. Byar Data (pages 399-400) 

IV. Method 1—Separate Models for Different Event 
Types(pages 400-403) 

V. The Independence Assumption (pages 403-411) 

VI. Cumulative Incidence Curves (CIC) 

(pages 412-420) 

VII. Conditional Probability Curves (CPC) 

(pages 420-421) 

VIII. Method 2—Lunn-McNeil (LM) Approach 
(pages 421-427) 

IX. Method 2a—Alternative Lunn-McNeil (LM a i t ) 
Approach (pages 427-430) 

X. Method 1—Separate Models versus 

Method 2—LM Approach (pages 431-434) 

XI. Summary (pages 434-439) 
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Objectives 


Upon completing this chapter, the learner should be able to: 

1. State or recognize examples of competing risks survival 
data. 

2. Given competing risks data, outline the steps needed to 
analyze such data using separate Cox models. 

3. Given computer output from the analysis of competing 
risk data, carry out an analysis to assess the effects of 
explanatory variables on one or more of the competing 
risks. 

4. State or describe the independence assumption typically 
required in the analysis of competing risks data. 

5. Describe how to carry out and/or interpret a “sensitivity 
analysis” to assess the independence assumption about 
competing risks. 

6. State why a survival function obtained from competing 
risk using the Cox model has a questionable interpreta¬ 
tion. 

7. State or describe the “cumulative incidence” approach 
for analyzing competing risks data. 

8. Given competing risk data, describe how to calculate a 
CIC and/or a CPC curve. 

9. Given competing risks data, outline the steps needed to 
analyze such data using the Lunn-McNeil method. 

10. Given computer output from fitting either a LM or LM a i t 
model, carry out an analysis to assess the effect of ex¬ 
planatory variables on one or more of the competing 
risks. 
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Presentation 


I. Overview 



Different types of 
events: ABC... 
(competing risks) 
are possible, but only 
one of these can 
occur per subject 



In this chapter, we consider survival data in which 
each subject can experience only one of different 
types of events over follow-up. The probabilities of 
these events are typically referred to as competing 
risks. We describe how to use the Cox PH model 
to analyze such data, the drawbacks to this ap¬ 
proach, and some approaches for addressing these 
drawbacks. 


II. Examples of Competing 
Risks Data 

1. Dying from either lung cancer or 
stroke 

2. Advanced cancer patients either 
dying from surgery or getting 
hospital infection 

3. Soldiers dying in accident or in 
combat 

4. Limb sarcoma patients 
developing local recurrence, lung 
metastasis, or other metastasis 
over follow-up 


Each example above allows only 
one event out of several possible 
events to occur per subject 

If event not death, then recurrent 
events are possible 

Competing risks + recurrent events 
beyond scope of this chapter 


Competing risks occur when there are at least two 
possible ways that a person can fail, but only one 
such failure type can actually occur. For example, 

1. A person can die from lung cancer or from a 
stroke, but not from both (although he can have 
both lung cancer and atherosclerosis before he 
dies); 

2. Patients with advanced-stage cancer may die 
after surgery before their hospital stay is long 
enough for them to get a hospital infection; 

3. Soldiers in war may die during combat or may 
die by (e.g., traffic) accident; 

4. In a clinical trial, patients with nonmetastatic 
limb sarcoma undergoing chemotherapy and 
surgery might develop a local recurrence, lung 
metastasis, or other metastasis after follow-up. 

For each of the above examples, the possible 
events of interest differ, but only one such event 
can occur per subject. Note, however, if at least 
one of the possible event types does not involve 
death, it is also possible that such events can recur 
over follow-up. Thus, although the analysis of re¬ 
current events that also involves competing risks 
may be required, this more complex topic is be¬ 
yond the scope of this chapter (see Tai et ah, 2001). 
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Objective: assess 

Xi, X-l ,..., Xp => Failure rate 

(survival probability) 

for any one event allowing for 
competing risks from other possi¬ 
ble events 


A logical objective for competing risks data is to 
assess the relationship of relevant predictors to the 
failure rate or corresponding survival probability 
of any one of the possible events allowing for the 
competing risks of the other ways to fail. 


Another objective 
Compare hazard rate for event A 
with hazard rate for event B 


We might also want to compare the failure rates 
(e.g., using a hazard ratio) for two or more possible 
events, controlling for relevant predictors. 


Lung Cancer vs. Stroke (1 ] 

HR L c(E vs. not E) = 1? 

(allowing for competing risk from 
stroke) 


In the lung cancer versus stroke example above, we 
might ask whether the lung cancer death rate in 
“exposed” persons is different from the lung can¬ 
cer rate in “unexposed” persons, allowing for the 
possibility that subjects could have died from a 
stroke instead. 


We might also want to know if the lung cancer 
HR(LC vs. Stroke) = 1 ? death rate differs from the stroke death rate con- 

(controlling for predictors) trolling for predictors of interest. 


Surgery Death vs. Hospital 
Infection (2) 


HRhospinf(E vs. not E) = 1 ? 
(allowing for competing risk from 
surgery) 

Note: death from surgery reduces 
number of hospital infections to be 
treated 


In the second example, the competing risks are 
death from surgery versus development of a hos¬ 
pital infection. For infection control investigators, 
the hospital infection event is of primary inter¬ 
est. Nevertheless, the occurrence of death from 
surgery reduces the burden of hospital infection 
control required. Thus, the estimation of hospital 
infection rates are complicated by the competing 
risk of death from surgery. 


Accidental Death vs. Combat 
Death (3) 

HRcombat (E vs. not E) 

(allowing competing risk of acci¬ 
dental death) 

Suppose entire company dies at 
accident time t before entering combat 

If 

ScombatM = P(T combat > t) = 1 
where T C ombat = time to combat death 


The third example involves competing risks of 
death from either combat or accident in a com¬ 
pany of soldiers. Here, primary interest concerns 
the hazard ratio for combat death comparing two 
exposure groups. Suppose the entire company dies 
at time t in a helicopter accident on their way to 
a combat area. Because no one died in combat 
by time t, the survival probability of not dying in 
combat is one, even though no combat took place. 
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However, 

Tc+a = combat or accidental death 
11 

“event free” S c+A (t) = P(T C+A > t) = 0 


However, if we define the outcome of interest as 
death from either combat or accident, the “event 
free” survival probability is zero after the accident 
occurred (at time t). 


Moreover, 

Skm(Tcombat > t) 


Moreover, the KM survival probability for combat 
death at time t is undefined because no one was at 
risk for a combat death at time t. 


is undefined because the risk set is 
empty at time t 


Competing Risks Data Survival This example points out that when there are com- 
Curve Interpretation? peting risks, the interpretation of a survival curve 

may be difficult or questionable (more on this is¬ 
sue later). 


Limb sarcoma patients (4) 
Competing risks 

1 = local recurrence, 2 = lung meta¬ 
stasis, or 3 = other metastasis 

HR C (E vs. not E), c = 1,2,3 
(allowing for competing risk from 
other two failure types) 


In the fourth example involving limb sarcoma pa¬ 
tients, the competing risks are the three failure 
types shown at the left. 

In this study, the investigators wanted hazard ra¬ 
tios for each failure type, allowing for competing 
risks from the other two failure types. 


HR(Lung Metastasis vs. Local 
Recurrence)? Controlling for 
Predictors 


It was also of interest to compare the failure rates 
for lung metastasis versus local recurrence (or any 
other two of the three failure types), controlling for 
relevant predictors. 


No failure types involve death 

If 

Recurrent events possible 

But can use classical competing 
risk methods if focus on only first 
failure 


Because none of the failure types involves death, 
recurrent events are possible for any of the three 
failure types. If, however, the information on only 
the first failure is targeted, the classical competing 
risk methodology described in this chapter can be 
applied. 
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III. Byar Data 

• Randomized clinical trial 

• Compare treatments for Stage 
III and IV prostate cancer 

• Rx status: placebo or one of 
3 dose levels of DES 


We now introduce an example of competing risks 
survival analysis of data from a randomized clini¬ 
cal trial (Byar and Green, 1980) comparing treat¬ 
ments for prostate cancer. We henceforth refer to 
this as the Byar data. Patients with Stages III 
(local extension beyond the prostate gland) and 
IV (distant metastases, elevated acid phosphatase, 
or both) prostate cancer were randomized to re¬ 
ceive either a placebo or one of three dose levels of 
the active treatment diethylstilbestrol (DES). 


Competing risks: deaths from 

Cancer (main focus) 

CVD 

Other 

Covariate information collected 
Some predictors grouped 


In this study, patients could die from prostate can¬ 
cer, cardiovascular disease, or other causes. Co¬ 
variate information was also collected to account 
for the possible influence of predictors on survival. 
These data have been analyzed extensively (Byar 
and Corle, 1977, Kay, 1986, and Lunn and McNeil, 
1995). Some grouping of the predictors was con¬ 
sidered to be clinically meaningful. 


Predictors 

Value 

Category 

Treatment (Rx) 

0 

Placebo, 0.2 mg DES 


1 

1.0, 5 mg DES 

Age at diagnosis 

0 

<74 years 

Diagnosis (Age) 

1 

75-79 years 


2 

>80 years 

Standardized 3 

0 

>100 

weight (Wt) 

1 

80-99 


2 

<80 

Performance 

0 

Normal 

status (PF) 

1 

Limitation of activity 

History of 

0 

No 

CVD (Hx) 

1 

Yes 

Hemoglobin (Hg) 

0 

>12 g/100 ml 


1 

9.0-11.9 g/100 ml 


2 

<9 g/100 ml 

Size of the primary 

0 

<30 cm 2 

lesion (SZ) 

1 

>30 cm 2 

Gleeson 

0 

<10 

Score + (SG) 

1 

>10 


a weight (kg) — height (cm) + 200 
+ index of tumor invasiveness/aggressiveness 


Key risk factors related to the primary outcome 
of interest (cancer deaths) and the appropriate 
grouping is shown at the left. 

Primary interest was to assess the effect of treat¬ 
ment (Rx) adjusted for relevant risk factors in the 
presence of the competing risks. Notice from the 
table that the Rx variable is grouped into a binary 
variable by coding subjects receiving the placebo 
or 0.2 mg of DES as 0 and coding subjects receiv¬ 
ing 1.0 or 5.0 mg of DES as 1. 
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Independence assumption (dis¬ 
cussed later) 

Next 

Analysis of competing risks 
survival data 

Assume independent 
competing risks 


From a clinical perspective, these three competing 
risks can be considered to be independent (e.g., 
failure from heart disease and/or other causes of 
death is unrelated to risk of failure from prostate 
cancer). We discuss this “independence assump¬ 
tion” in more detail in a subsequent section of this 
chapter. 

We now describe the approach typically used to 
analyze competing risks data. This approach as¬ 
sumes that competing risks are independent. We 
illustrate this approach using the Byar data. 


IV. Method 1—Separate 
Models for Different 
Event Types 

• Use Cox (PH) model 

• Estimate separate hazards or 
HRs for each failure type 

• Other (competing) failure types 
are treated as censored 

• Persons lost to follow-up or 
withdrawal are also censored 

If only one failure type of interest 

Estimate only one hazard or HR 


Cause-specific hazard function 

h c (t) = lim P(t < T c < t + At|T c > t)/At 

At->0 

where T c = time-to-failure from 
event c 

c = f, 2,_C (# of event types) 

Cox PH cause-specific model 

(event-type c): 

h c (t,X) = h 0c (t)exp[^ |3 ic Xi], 

i=l 

c= 1,...,C 

|3 ic allows effect of X; to differ by 
event-type 


The typical approach for analyzing competing 
risks data uses the Cox (PH) model to separately 
estimate hazards and corresponding hazard ratios 
for each failure type, treating the other (compet¬ 
ing) failure types as censored in addition to those 
who are censored from loss to follow-up or with¬ 
drawal. We refer to this approach as Method 1 
because we later describe an alternative approach 
(Method 2) that requires only a single model to be 
fit. 


If only one failure type is of primary interest, then 
the analysis might be restricted to estimating haz¬ 
ards or hazard ratios for that type only (but still 
treating the competing failure types as censored). 

To describe this method mathematically, we define 
the cause-specific hazard function shown at the 
left. The random variable T c denotes the time-to- 
failure from event type c. Thus, h c (t) gives the in¬ 
stantaneous failure rate at time t for event type c, 
given not failing from event c by time t. 


Using a Cox PH model that considers predictors 
X = (Xi, X 2 ,..., X p ), the cause-specific hazard 
model for event type c has the form shown at the 
left. Note that |3 ic , the regression coefficient for the 
ith predictor, is subscripted by c to indicate that 
the effects of the predictors may be different for 
different event types. 
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BYAR DATA EXAMPLE 


Competing Risks: Cancer, CVD, Other 
Cause-specific model: Cancer 
No-interaction model: 

h Ca (t,X) = h 0Ca (t)exp[(3 1Ca Rx + (3 2C a A ge 
+ P 3 c a Wt + p4c a PF + |3 5 caHx 
+ P6CaHG + p 7Ca SZ + P 8Ca SG] 

HR Ca (RX = 1 vs. RX = 0) = exp[P 1Ca ] 

CVD and Other deaths are censored 


Cause-specific model: CVD 

hcvD(hX) = ho CVD (t)exp[Pi CVD Rx + p2CVD^S e 
+ p3CVI)Wt + P 4 CVDPF + Pscvd^X 
+ PdCVD^G + P7CVD^Z + PsCVD^G] 

HR cvd (RX = 1 vs. RX = 0) = exp[p lcVD ] 
Cancer and Other are censored 
Cause-specific model: Other 

HoTti(t»X) = h OOTH (t)exp[pj OTH Rx + P 20 TH^§ e 
+ Psora^t + P 40 THPF + PsothHX 
+ P60ThHG + p 70TH SZ + PsothSG] 

Cancer and CVD are censored 


We illustrate the above model using the Byar data 
involving the three competing risks and the eight 
predictors. 

A no-interaction cause-specific model for Cancer 
death (Ca) is shown at the left. From this model, 
the hazard ratio for the effect of Rx controlling for 
the other variables is exp[(3 1Ca ]. 

Because Cancer is the event-type of interest, the 
two competing event-types, CVD and Other, need 
to be treated as censored in addition to usual cen¬ 
sored observations (i.e., for persons who are either 
lost to follow-up or withdraw from the study). 

Similarly, if CVD is the event-type of interest, the 
cause-specific no-interaction hazard model and 
the hazard ratio formula for the effect of treatment 
is shown at the left, and the event types Cancer 
and Other would be treated as censored. 

And finally, if Other is the event-type of interest, 
the cause-specific no-interaction hazard model 
and the hazard ratio formula for the effect of treat¬ 
ment is shown at the left, and the event types 
Cancer and CVD would be treated as censored. 


Table 9.1. Edited Output for 
Cancer with CVD and Other 
Censored 


Haz. 


Var 

DF 

Coef 

Std.Err. 

P >|z| 

Ratio 

Rx 

1 

-0.550 

0.170 

0.001 

0.577 

Age 

1 

0.005 

0.142 

0.970 

1.005 

Wt 

1 

0.187 

0.138 

0.173 

1.206 

PF 

1 

0.253 

0.262 

0.334 

1.288 

Hx 

1 

-0.094 

0.179 

0.599 

0.910 

HG 

1 

0.467 

0.177 

0.008 

1.596 

SZ 

1 

1.154 

0.203 

0.000 

3.170 

SG 

1 

1.343 

0.202 

0.000 

3.830 


Edited output for each of the above three cause- 
specific models is now presented. 


First, we show the results for the event type Can¬ 
cer, treating CVD and Other as censored. 


Log likelihood = —771.174 
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HR C a (RX = 1 vs. RX = 0) 

= exp(—0.550) = 0.577 
Wald ChiSq = (—.550/.170) 2 

= 10.345 (P = 0.001) 

Signif. below .01 level 


From this output, the adjusted HR for the effect of 
Rx is 0.577 (=1/1.73). 

The P-value for a two-tailed Wald test is 0.001; thus 
Rx has a significant positive effect on survival for 
Cancer death with competing risks from CVD and 
Other deaths. 


95% Cl for exp[(3 1Ca ]: 
exp[-0.550± 1.96(0.170)] 
= (0.413, 0.807) 


Also, the 95% confidence interval for this HR is 
(0.413, 0.807) = (1/2.43, 1/1.24). 


Table 9.2. Edited Output for CVD 
with Cancer and Other Censored 


Var 

DF 

Coef 

Std.Err. 

p> |z| 

Haz.Ratio 

Rx 

1 

0.354 

0.174 

0.042 

1.425 

Age 

1 

0.337 

0.134 

0.012 

1.401 

Wt 

1 

0.041 

0.150 

0.783 

1.042 

PF 

1 

0.475 

0.270 

0.079 

1.608 

Hx 

1 

1.141 

0.187 

0.000 

3.131 

HG 

1 

0.018 

0.202 

0.929 

1.018 

SZ 

1 

-0.222 

0.364 

0.542 

0.801 

SG 

1 

-0.023 

0.186 

0.900 

0.977 


Log likelihood = —763.001 


HRcvd(RX = 1 vs. RX = 0) 
= exp(0.354) = 1.425 
Wald ChiSq = (.354/. 174) 2 

= 4.220 (P = 0.042) 

Signif. at .05 level 


We next provide edited output when the event-type 
is CVD, treating Cancer and Other as censored. 


Here, the adjusted HR for the effect of Rx is 1.425. 

The P-value for a two-tailed Wald test is 0.042; 
thus, Rx has a significant (P < .05) but negative 
effect on survival for CVD death with competing 
risks from Cancer and Other deaths. 


95% Cl forexp[|3 lcVD ]: The 95% confidence interval for this HR is (1.013, 

exp.[0.354 ± 1.96(0.174)] 2 - 004 )- 

= (1.013,2.004) 


Table 9.3. Edited Output for Other 
with Cancer and CVD Censored 


Var 

DF 

Coef 

Std.Err. 

p>|z| 

Haz.Ratio 

Rx 

1 

-0.578 

0.279 

0.038 

0.561 

Age 

1 

0.770 

0.204 

0.000 

2.159 

Wt 

1 

0.532 

0.227 

0.019 

1.702 

PF 

1 

0.541 

0.422 

0.200 

1.718 

Hx 

1 

0.023 

0.285 

0.935 

1.023 

HG 

1 

0.357 

0.296 

0.228 

1.428 

SZ 

1 

0.715 

0.423 

0.091 

2.045 

SG 

1 

-0.454 

0.298 

0.127 

0.635 


Log likelihood = —297.741 


Last, we provide edited output when the event- 
type is Other, treating Cancer and CVD as 
censored. 
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H£coth(RX = 1 vs. RX = 0) 
= exp(-0.580) = 0.561 
Wald ChiSq = (—.578/.279) 2 
= 4.29 (P = 0.038) 

Signif. at .05 level 


Here, the adjusted HR for the effect of Rx is 0.561 
(= 1/1.78). 


The P-value for a two-tailed Wald test is .038; thus, 
Rx has a significant (P < .05) protective effect on 
survival for Other deaths with competing risks 
from Cancer and CVD deaths. 


95% Cl for exp[|3 10XH ]: The 95% confidence interval for this HR is (0.325, 

exp [-0 578 ± 1 96(0 279)] 0.969), which is somewhat imprecise. 

= (0.325, 0.969) 


Not assessed in the above analysis: 
PH assumption 

Interaction of Rx with control vari¬ 
ables 


We have thus completed a competing risk analy¬ 
sis of the Byar data assuming that a no-interaction 
Cox PH model is appropriate. We haven’t actually 
checked the PH assumption for any of the vari¬ 
ables in the model nor have we assessed whether 
there is significant interaction between Rx and the 
other variables being controlled. Typically, these 
situations should be explored to ensure a more 
appropriate analysis. 


V. The Independence 
Assumption 

Censoring: a major concern in 
survival analysis 

Right Censoring vs. left censoring 

4 

• More often 

• Our focus 


At the beginning of this text in Chapter 1, we intro¬ 
duced the concept of censoring as a major concern 
for the analysis of survival data. We distinguished 
between right- and left-censoring and indicated 
our focus in the text would be on right-censoring, 
which occurs more often. 


Important assumption 

• Required for all approaches/ 
models described to this point 

• Relevant for competing risks 

Censoring Is Noninformative 
(Synonym: Independent) 


We also briefly introduced in Chapter 1 an impor¬ 
tant assumption about censoring that is required 
for all approaches/models for analyzing survival 
data described up to this point, including data 
with competing risks. This assumption is typically 
stated as follows: censoring is noninformative or 
independent. 
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Typical context 

• No competing risks 

• Homogeneous sample 

Definition: noninformative censor¬ 
ing: 

Probability of being censored at 
time t does not depend on prog¬ 
nosis for failure at time t 


EXAMPLE 


Harry in homogeneous risk set at time t 
Event: death from any cause 


3 possible outcomes at time t 
Fail, not fail, unknown (censored) 
status 


Harry in poorer health than other 
subjects in risk set at time t 

if 

Harry's prognosis for failing higher 
than other subjects in the risk set at 
time t 

Noninformative censoring 

n 

Pr (C | PH) = Pr(C | GH) 
where 

C = censoring 
PH = poor health 
GH = good health 


Noninformative censoring can be defined in a con¬ 
text that assumes the absence of competing risks 
and a homogeneous study sample; that is, all sub¬ 
jects have the same values for covariate predictors. 

In the above context, we define noninformative 
censoring to mean that the probability of being 
censored for any subject in the risk set at time t 
does not depend on that subject’s prognosis for 
failure at time t. 


Suppose, for example, that Harry is in a homoge¬ 
neous risk set at time t (e.g., the entire risk set is, 
say, white, male, age 60) and the event of interest 
is death from any cause, so there are no competing 
risks. 

Given the above scenario, one of the following 
three outcomes can be observed on each subject, 
including Harry, in the risk set at time t: he can 
fail (i.e., die), not fail, or have unknown (censored) 
outcome from withdrawal or loss to follow-up. 

Now, suppose Harry is in much poorer health 
than other subjects in the risk set at time t. Then 
Harry's potential for failing at time t is likely to 
be higher than for other subjects in the risk set at 
time t. 


Now, if censoring is noninformative, then de¬ 
spite Harry's being in poorer health than other 
subjects at time t, a subject like Harry would 
just as likely be censored as any other sub¬ 
ject in the risk set, including subjects healthier 
than Harry who have a lower prognosis for failure. 
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Harry more likely to be censored 
than other subjects in risk set 

4 

Pr(C | PH) > Pr(C | GH) 

n 

Informative Censoring 

(Synonym: DEPENDENT) 

Biased results 

S(t) overestimates S(t) 
if 

large proportion of censored sub¬ 
jects actually failed at time t 

Competing risks 

11 

Different types of censoring 

• Failure from competing risks 

• Lost to follow-up 

• Withdrawal 

• End of study 


Noninformative (i.e ., 

Independent) Censoring with 
Competing Risks 

Harry in risk set at time t 
11 

Harry just as likely to be censored 
as any other subject in risk set 
regardless of reason for censoring 
or prognosis for event-type A 

Byar data: 3 competing risks 

(Cancer, CVD, Other deaths) 

Noninformative censoring? 


Nevertheless, because Harry has poor health, he 
might be more likely to drop out of the study (i.e., 
be censored) at time t because of his poor health 
than other healthier subjects in the risk set. If 
so, Harry's censoring would be informative (or 
dependent). 

Informative censoring unfortunately can lead to 
biased results in a survival analysis. A bias can 
result because people who get censored are more 
likely to have failed than those not censored. Thus, 
the estimated survival probability at any time t 
may overestimate the true survival probability at 
time t if a large proportion of those with unknown 
status (i.e., censored) actually failed. 

When the survival analysis problem involves com¬ 
peting risks, the requirement of noninformative 
or independent censoring has the additional 
complication that there are different types of cen¬ 
soring that are possible. That is, when focusing 
on the cause-specific hazard for event-type A, say, 
competing risks other than A are also considered 
as censored in addition to standard censorship 
from lost to follow-up, withdrawal, or ending of 
the study. 

Thus, for competing risks, censoring is noninfor¬ 
mative or independent if for a subject like Harry 
in the risk set at time t, Harry is just as likely to be 
censored at time t as any other subject in the risk 
set at t, regardless of the reason for censoring, in¬ 
cluding failure from a competing risk, or Harry’s 
prognosis for failing from event-type A. 


For example, in the Byar data set, there were 
three competing risks of interest, Cancer, CVD, 
or Other deaths. What, then, must we assume if 
censoring in this study were noninformative? 
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Cause-specific focus: Cancer 
Noninformative censoring 

Harry just as likely to be censored 
as any other subject in risk set 
regardless of type of censoring 

Types of censoring—competing risks: 
CVD or Other death or usual censoring 


Suppose censoring is noninformative and we 

focus on cause-specific deaths for Cancer. Then 
any subject (e.g., Harry) in the risk set at time t 
with a given set of covariates is just as likely to 
be censored at time t as any other subject in the 
risk set with the same set of covariates regard¬ 
less of whether the reason for censoring is a CVD 
or Other death, withdrawal from study, or loss to 
follow-up. 


Informative (Dependent] 

Censoring 

Harry in poorer health than other 
subjects in risk set at time t 

Harry more likely to be censored 
than other subjects in risk set 

Initial context: subjects are homoge¬ 
neous 

More general context: each subject 
representative of subjects in the risk 
set with the same values of predic¬ 
tors 


EXAMPLE 


E = exposed. Age = 45, Gender = male 

Noninformative censoring 

11 

All subjects in risk set for which 
E = exposed, Age = 45, Gender = male 
are equally likely to be censored 
regardless of type of censoring 


On the other hand, if censoring is informative 
(or dependent) and Harry was in poorer health 
than other subjects in the risk set at time t, he 
might be more likely to be censored, including dy¬ 
ing from CVD or Other cause, at time t than other 
subjects in the risk set. 


Recall that the context in which we initially de¬ 
fined noninformative censoring assumed subjects 
in the risk set at time t to be “homogeneous,” that 
is, having the same values for the predictors of 
interest. Actually, because predictors are typically 
included in one’s model, the more general context 
assumes that each subject in the risk set at time t is 
representative of all subjects with the same values 
of the predictors who survive to time t. 

For example, if the predictors are exposure (E), 
Age, and Gender, then noninformative censoring 
requires that a subject in the risk set at time t who 
is exposed, 45 years old, and male, is just as likely 
to be censored at time t as any other subject in 
the risk set at time t who is also exposed, 45 years 
old, and male. 
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Important assumption for 
competing risks 

Censoring is noninformative (i.e., 
independent) 

regardless of different types of 
censoring possible 

Synonym: Competing risks are 
independent 

Questions about independence as¬ 
sumption 

1. How can we determine whether 
this assumption is satisfied? 

2. How can we proceed with the 
analysis to consider the 
possibility that the assumption 
is not satisfied? 

Answer to 1: 

We can never explicitly prove the 
assumption is satisfied for given 
data. 

For example, Byar data: Cancer 
death 

Then can’t determine would have 
died from Cancer if hadn't died 
from CVD. 

CVD death 

fl 

Cancer death unobservable 
In general 

Failure from competing risk A 

if 

Failure from competing risk B 
unobservable 


Answer to 2: 

Alternative strategies available 
but no strategy is always best 


The important message at this point when ana¬ 
lyzing competing risks survival data is that it is 
typically assumed that censoring is noninforma¬ 
tive or independent regardless of the different 
ways that censoring can occur, including failure 
from competing risks other than the cause-specific 
event-type of interest. A synonymous expression 
is to say that competing risks are independent, 
which we henceforth adopt in our remaining dis¬ 
cussion of this topic. 

So, if we typically require that competing risks are 
independent, (1) how can we determine whether 
this assumption is satisfied and (2) how can we 
proceed with the analysis to consider the possibil¬ 
ity that the assumption is not satisfied? 


Unfortunately, the answer to the first question is 
that we can never explicitly prove that compet¬ 
ing risks are or are not independent for a given 
dataset. For example, in the Byar dataset, we can¬ 
not determine for certain whether a subject who 
died from, say, CVD at time t would have died from 
Cancer if he hadn’t died from CVD. 


In other words, dying from Cancer at time t is an 
unobservable outcome for a subject who died from 
CVD at or before time t. More generally, failure 
from a competing risk at time t is unobservable for 
a subject who has already failed from a different 
competing risk up to time t. 

Because we can never fully determine whether 
competing risks are independent, how can we pro¬ 
ceed with the analysis of competing risks survival 
data? The answer is that there are several alterna¬ 
tive strategies, but no one strategy that is always 
best. 
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Strategy 1 

Decide assumption satisfied on 
clinical/biological/other grounds 


One strategy is to decide on clinical/biological/ 
other grounds without any data analysis that the 
independence assumption is satisfied and then 
carry out the analysis assuming independence. 


EXAMPLE OF STRATEGY 
1—CANCER VS. CVD 


Decide independence if subjects who were 
censored because of CVD death were no 
more or less likely to have died from 

Cancer. 


For example, suppose the two competing risks are 
Cancer deaths and CVD deaths. Then you may 
decide that the assumption of independent com¬ 
peting risks is reasonable if at any time t, subjects 
who were censored because of CVD death were no 
more or less likely to have died from Cancer. 


Strategy 2 

Include common risk factors for 
competing risks in survival model 


EXAMPLE OF STRATEGY 
2—CANCER VS. CVD 


Include age smoking in model to 
remove the common effects of these 
variables on competing risks 


A second strategy is to measure those variables 
that are common risk factors for competing risks 
being considered and then include those variables 
in the survival model. For example, with Cancer 
and CVD, perhaps including age and smoking sta¬ 
tus in the survival model might remove common 
effects on competing risks. 


Criticism of Strategies 1 and 2 

Assumptions cannot be 
verified with observed data 


A criticism of each of the above strategies is that 
they both rely on assumptions that cannot be ver¬ 
ified with the observed data. 


Strategy 3 

Use a sensitivity analysis 

• Considers “worst-case” 
violations of the 
independence assumption 


Another strategy (3) that can be used is a sensitiv¬ 
ity analysis. As with Strategies 1 and 2, a sensitiv¬ 
ity analysis cannot explicitly demonstrate whether 
the independence assumption is satisfied. How- 
ever, this strategy allows the estimation of param¬ 
eters by considering “worst-case” violations of the 
independence assumption. 


Sensitivity analysis 

• Determines extreme ranges for 
estimated parameters of one’s 
model 


Thus, using a sensitivity analysis, the investigator 
can determine extreme ranges for the estimated 
parameters in one's model under violation of the 
independence assumption. 


If “worst-case” not meaningfully 
different from independence 
then 

at most a small bias when 
assuming independence 


If such “worst-case” results do not meaningfully 
differ from results obtained under the indepen¬ 
dence assumption, then the investigator may 
conclude that at most a small bias can result from 
an analysis that assumes independence. 
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If “worst-case” meaningfully 
different from independence 
then 

only extreme of bias but not 
actual bias is determined 


If, on the other hand, the sensitivity analysis pro¬ 
vides results that meaningfully differ from results 
obtained under the independence assumption, the 
investigator learns only the extremes to which 
the results could be biased without adjusting 
for the actual bias. 


EXAMPLE BYAR DATA 


Cause-specific focus: Cancer 
Censored: CVD deaths, Other deaths, 
usual censoring 

Worst-case situations: 

1. CVD or Other deaths are assumed 
to die of cancer instead 

2. CVD or Other deaths assumed to 
survive as long as the largest 
survival time observed in the study 


We now illustrate how a sensitivity analysis can be 
carried out using the Byar data, where we focus 
on the cause-specific survival for Cancer deaths, 
treating CVD and Other deaths as censored in ad¬ 
dition to usual censoring. 

The following two worst-case situations are con¬ 
sidered. (1) All subjects that are censored because 
of CVD or Other deaths are assumed to die of can¬ 
cer instead. (2) All subjects that are censored be¬ 
cause of CVD or Other deaths survive as long as 
the largest survival time observed in the study. 


Table 9.4. Edited Output for 
Cancer Worst-Case (1) 


Var 

DF 

Coef 

Std.Err. 

p>]z| 

Haz.Ratio 

Rx 

1 

-0.185 

0.110 

0.092 

0.831 

Age 

1 

0.286 

0.087 

0.001 

1.332 

Wt 

1 

0.198 

0.093 

0.032 

1.219 

PF 

1 

0.402 

0.170 

0.018 

1.495 

Hx 

1 

0.437 

0.112 

0.000 

1.548 

HG 

1 

0.292 

0.120 

0.015 

1.339 

SZ 

1 

0.672 

0.159 

0.000 

1.958 

SG 

1 

0.399 

0.115 

0.001 

1.491 


Log likelihood = —1892.091 


Table 9.5. Edited Output for 
Cancer Worst-Case (2) 


Haz. 


Var 

DF 

Coef 

Std.Err. 

p>|z| 

Ratio 

Rx 

1 

-0.411 

0.169 

0.015 

0.663 

Age 

1 

-0.118 

0.139 

0.394 

0.888 

Wt 

1 

0.086 

0.138 

0.532 

1.090 

PF 

1 

0.125 

0.254 

0.622 

1.133 

Hx 

1 

-0.266 

0.179 

0.138 

0.767 

HG 

1 

0.314 

0.169 

0.063 

1.369 

SZ 

1 

0.825 

0.197 

0.000 

2.282 

SG 

1 

1.293 

0.201 

0.000 

3.644 


Table 9.4 and Table 9.5 give edited output for the 
above two scenarios followed by a repeat of the 
output previously shown (Table 9.1) under the in¬ 
dependence assumption. 

To carry out worst-case scenario (1), the Status 
variable (indicating whether a subject failed or 
was censored) was changed in the dataset from 
0 to 1 for each subject that had a CVD or Other 
death. 


For worst-case scenario (2), the longest survival 
time observed in the study was 76 weeks. Thus, 
the survival time for each subject that had a CVD 
or Other death was changed in the dataset from 
the actual time of death to 76 weeks. 

To evaluate the results of the sensitivity analysis, 
we need to compare the output in Table 9.1, which 
assumes that competing risks are independent, 
with output for worst-case situations provided in 
Table 9.4 and Table 9.5. We focus this compari¬ 
son on the estimated coefficient of the Rx variable. 


Log likelihood = —839.631 
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Table 9.1. (Repeated). Edited 
Output for Cancer with CVD and 
Other Censored (Assumes 
Competing Risks Independent) 


Var 

DF 

Coef 

Std.Err. 

Haz. 

p > |z| Ratio 

Rx 

1 

-0.550 

0.170 

0.001 0.577 

Age 

1 

0.005 

0.142 

0.970 1.005 

Wt 

1 

0.187 

0.138 

0.173 1.206 

PF 

1 

0.253 

0.262 

0.334 1.288 

Hx 

1 

-0.094 

0.179 

0.599 0.910 

HG 

1 

0.467 

0.177 

0.008 1.596 

SZ 

1 

1.154 

0.203 

0.000 3.170 

SG 

1 

1.343 

0.202 

0.000 3.830 

Log] 

likelihood = —771.174 


Var 

DF 

Coef Std.Err. p > 

|z| Haz.Ratio 

Worst-Case (1): 



Rx 

1 

-0.185 

0.110 0.092 0.831 

Worst-Case (2): 



Rx 

1 

-0.411 

0.169 0.015 0.663 

Independent competing risks: 


Rx 

1 

-0.550 

0.171 0.001 0.577 



WC(1) 

WC(2) 

Independent 

HRs 


0.831 

0.663 

0.577 

P-values 

0.092 

0.015 

0.001 



(N.S.) 

(<•05) 

(«.01) 

Independence 

X 

Nonindependence 
[ ] 


.577 .663 .831 


The first line of output corresponding to the Rx 
variable is shown at the left for both worst-case 
scenarios together with the output obtained from 
assuming independent competing risks. 


These results for the RX variable show consider¬ 
able differences among all three scenarios. In par¬ 
ticular, the three estimated hazard ratios are 0.831 
(=1/1.20), 0.663 (=1/1.51), and .577 (=1/1.73). 
Also, the P-values for the significance of the effect 
of Rx (0.092, 0.015, .001) lead to different conclu¬ 
sions about the effect of Rx. 

Note that the HR obtained from assuming inde¬ 
pendence does not lie between the HRs from the 
two worst-case scenarios. This should not be sur¬ 
prising because both worst-case scenarios assume 
nonindependence. 


If 

competing risks not independent 
then 

conclusions about the effect of Rx 
could be very different 


These results suggest that if the competing risks 
were not independent, then the conclusions about 
the effect of Rx could be somewhat different. 
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But, 

• Have not demonstrated whether 
independence assumption 
satisfied 

• Have not obtained correct 
results under violation of 
independence assumption 

Worst-case (1) 

More departure from indepen¬ 
dence 

More realistic 

More emphasis 
than 

Worst-case (2) 

Sensitivity analysis: approaches can 
vary for example, 

• Randomly select subset of 50% 
(or 25%) of subjects censored 
with CVD or Other deaths 

• Assume everyone in subset dies 

of Cancer 

Main point: 

Sensitivity analysis is one of sev¬ 
eral strategies to address concern 
about independence assumption 

Evaluates how badly biased the re¬ 
sults can get if independence not 
satisfied 

Nevertheless 

• No method to directly assess 
independence assumption 

• Typical analysis assumes 
independence assumption is 
satisfied 


However, these results do not demonstrate 
whether the independence assumption is satisfied, 
nor do they provide estimates of the unbiased haz¬ 
ard ratios and corresponding Wald tests under vi¬ 
olation of the independent assumption. 


Worst-case (1) gives more departure from inde¬ 
pendence than worst-case (2). It can also be ar¬ 
gued that worst-case (1) is more realistic and thus 
should be emphasized more than worst-case (2), 
because subjects who were censored because of 
CVD or Other deaths would not be expected to 
survive the entire study if they hadn’t died. 


The previous observation suggests that the inves¬ 
tigator can vary the approach used to either carry 
out or interpret such a sensitivity analysis. For ex¬ 
ample, an alternative approach would be to mod¬ 
ify worst-case (1) by randomly selecting a subset 
of 50% (or 25%) of subjects censored with CVD or 
Other deaths and then assuming that everyone in 
this subset dies of Cancer instead. 

In any case, the main point here is that a sensitiv¬ 
ity analysis of the type we have illustrated is one of 
several strategies that can be used to address con¬ 
cern about the independence assumption. Such a 
sensitivity analysis allows the investigator to eval¬ 
uate how badly biased the results could get if the 
independence assumption is not satisfied. 


Nevertheless, as mentioned earlier, there is no 
method currently available that can directly as¬ 
sess the independence assumption nor guarantee 
correct estimates when the independence assump¬ 
tion is violated. Consequently, the typical survival 
analysis assumes that the independence assump¬ 
tion is satisfied when there are competing risks, 
even if this is not the case. 
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VI. Cumulative Incidence 
Curves (CIC) 

Survival curves S(t): 

provide summary information over 

time of survival experience 

KM: empirical approach for esti¬ 
mating survival curves 

Adjusted survival curves: general¬ 
ized version of KM using a regres¬ 
sion model to adjust for covariates 

Up to now: One event-type of 

interest (no competing 
risks) 

Competing risks: KM may not be as 
informative as when only one risk 


Hypothetical Study 

• n = 100 subjects 

• All subjects with prostate cancer 


Survt (months) 

# Died 

Cause 

3 

99 

CVD 

5 

1 

Cancer 


Study goal: cause-specific cancer 
survival 

Censored: CVD deaths 


Table 9.6. Hypothetical Survival 
Data 


j 

tj 

n i 

lUj 

9j 

Sca(tj)^KM 

0 

0 

100 

0 

0 

1 

1 

3 

100 

0 

99 

1 

2 

5 

1 

1 

— 

0 


We have previously discussed (Chapter 1 and 
beyond) the use of survival curves to provide sum¬ 
mary information over time of the survival expe¬ 
rience of (sub) groups of interest. The Kaplan- 
Meier (KM) approach (Chapter 2), also called the 
product-limit approach, is a widely used empiri¬ 
cal method for estimating survival curves. A gen¬ 
eralized version of KM can be used with a regres¬ 
sion (e.g., Cox) model to estimate adjusted survival 
curves (Chapter 3) that account for covariates. Up 
to now, such survival curves have been described 
only for the situation when there is only one event- 
type of interest. 


When competing risks are being considered, the 
KM survival curve may not be as informative as 
with only one risk. 

Consider the following hypothetical scenario: a 
5-month follow-up of 100 subjects with (say, 
prostate) cancer. Suppose that at 3 months from 
start of follow-up, 99 of the 100 subjects die from 
CVD. And at 5 months, the 1 remaining subject 
dies from prostate cancer. 


The goal of the study is to determine the cause- 
specific survival experience for cancer mortality, 
where a CVD death is considered as censored. 

Table 9.6 summarizes the survival experience in 
this hypothetical study. The first five columns of 
this table show the ordered failure-time interval 
number (j), the time of failure (tj), the number 
in the risk set (n,), the number who fail (m,), and 
the number who are censored at each failure time 
(qj), assuming that a subject who died of CVD at 
a given time is censored at that time. The last col¬ 
umn shows the KM survival probabilities Sca(tj) 
for cause-specific cancer at each failure time. 
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Risk set at tj = 5: 1 subject 

Pr(T > 5 | T > 5) = (1 - l)/2 = 0 

KMca: Sca(t = 5) 

= S(t = 4) x Pr(T > 5 | T > 5) 
= 1 x 0 

= 0 


From this table, we see that there is only one sub¬ 
ject in the risk set at 5 months, and that this sub¬ 
ject fails at month 5. The conditional probability 
of surviving past 5 months given survival up to 
5 months is (1 — 1)/1 = 0, so that the KM survival 
probability at 5 months is 0. 


KMca => Risk Ca (T < 5) 
= 1-0 = 1 


Nevertheless, 

1 cancer death 
100 initial subjects 


= 0.01 


(small) 


Thus, use of the KMc a curve in the presence 
of competing risks (for CVD), suggests that the 
5-month risk for cancer death is 1; that is, 1 — Sc a 
(t = 5). Nevertheless, because 99 patients died of 
CVD instead of cancer, the proportion of the initial 
100 subjects who died of cancer is .01, a very small 
“risk” in contrast to the KM-based “risk” of 1. 


Question: 

How many of the 99 CVD deaths 
would have died of cancer at t = 5 
if they hadn’t died of CVD at t = 3? 


A natural question at this point is, how many of the 
99 patients who died of CVD at 3 months would 
have died of cancer by 5 months instead if they 
hadn't died of CVD? 


Cannot answer: unobservable Unfortunately, we cannot ever answer this ques¬ 

tion because those dying of CVD cannot be ob¬ 
served further once they have died. 


Table 9.7. Hypothetical Survival 
Data Sensitivity Analysis A (99 
CVD Deaths of Cancer at t = 5) 


j 

tj 

n i 


<Jj 

S Ca (tj)^KM 

0 

0 

100 

0 

0 

1 

1 

3 

100 

0 

0 

1 

2 

5 

100 

100 

0 

0 


But we can consider a sensitivity-type of analy¬ 
sis to see what might happen under certain al¬ 
ternative scenarios. Suppose, for example, that all 
99 subjects who died of CVD at 3 months would 
have died of cancer at 5 months if they hadn’t died 
of CVD. Also assume as before that the 100th sub¬ 
ject survived up to 5 months but then immediately 
died. The survival experience for this situation is 
shown in Table 9.7. Notice that the KM survival 
probability at month 5 is 0, which is the same value 
as obtained in the original dataset. 
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KM method assumes non- 
informative (i.e., independent) cen¬ 
soring 

4 

Pr(T > 51 censored at month 3) 

Pr(T > 51 survived to month 5) = 0 

99 CVDs deaths would have been 
cancer deaths at month 5 


Table 9.8. Hypothetical Survival 
Data Sensitivity Analysis B (99 
CVD Deaths of survive past t = 5) 


j 

tj 

n i 

mj 

qj 

Sca(tj) ** KM 

0 

0 

100 

0 

0 

1 

1 

3 

100 

0 

0 

1 

2 

5 

100 

1 

99 

0.99 


The reason why Tables 9.6 and 9.7 give the same 
5-month survival probability (=0) is that the KM 
method assumes noninformative (i.e., indepen¬ 
dent) censoring. For the original data (Table 9.6), 
noninformative censoring requires that those who 
were censored at month 3 were as likely to have 
died from cancer at month 5 as those who were 
in the risk set at month 5. Because the one per¬ 
son in the risk set at month 5 actually died from 
cancer, then the KM method assumes that all 99 
CVD deaths being viewed as censored would have 
been cancer deaths at month 5, which is what is 
represented in Table 9.7. 

Now let’s consider a different version (B) of a sen¬ 
sitivity analysis. Suppose that all 99 subjects who 
died of CVD at 3 months would not have died of 
cancer at 5 months if they hadn’t died of CVD. Also 
assume as before that the 100th subject survived 
up to 5 months but then immediately died. The 
survival experience for this situation is shown in 
Table 9.8. 


Table 9.8: S Ca (t = 5) = 0.99 The KM survival probability at month 5 is 0.99 

different from (i.e., close to 1), which is very different from 

Table 9.6: S Ca (t = 5) = 0 the value of 0 obtained in the original dataset 

(Table 9.6). 


Focus on 1 - S(t) = Risk: If we then focus on 1 - S(t) instead of S(t), sensi- 

Risk Ca (T < 5) = 1 — 0.99 = 0.01 tivity analysis B suggests that the 5-month risk for 

cancer death is 0.01 (i.e., 1 - 0.99). 


Table 9.6: Risk Ca (T < 5) = 1 
derived from the data 

Table 9.8: Risk Ca (T < 5) = 0.01 
derived from sensitivity analysis 

but also derived directly from 
data as a marginal probability 


We thus see that the KM-based risk of 1 computed 
from the actual data (Table 9.6) is quite differ¬ 
ent from the KM-based risk of .01 computed in 
Table 9.8, where the latter derives from a sensitiv¬ 
ity analysis that does not use the actual data. Note, 
however, that a “risk” of .01 for cancer death can 
be derived directly from the actual data by treat¬ 
ing the CVD deaths as cancer survivors. That is, 
.01 is the proportion of all subjects who actually 
developed cancer regardless of whether they died 
from CVD. This proportion is an example of what 
is called a marginal probability. 
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Which is more informative, 

Risk Ca (T < 5) = 1 or 0.01? 

Answer: both informative 

“Risk” of .01 considers treatment 
utility 

for example, proportion of cancer 
patients needing treatment 


“Risk” of 1 considers etiology, 
providing competing risks are 
independent 

for example, cancer survival is un¬ 
likely after 5 months 


Main point 

KM survival curve may not be very 

informative 

• Requires independence 
assumption about competing 
risks 

• Independence assumption 
cannot be verified 

Alternative to KM: Cumulative 
Incidence Curve (CIC) uses 
marginal probabilities 


Only one risk: CIC = 1 — KM 

CIC with competing risks 

• Derived from cause-specific 
hazard function 

• Estimates marginal 
probability when competing 
risks are present 

• Does not require independence 
assumption 


So which of these two “risk” estimates (1 vs. 01) 
is more informative? Actually, they are both infor¬ 
mative in different ways. 


The “risk” of .01 is informative from the stand¬ 
point of treatment utility for cancer because in 
these data, the proportion of cancer patients need¬ 
ing treatment is quite small when allowing for 
competing risks. 

On the other hand, the “risk” of 1, corresponding 
to the survival probability of 0, is informative from 
an etiologic standpoint providing competing risks 
are independent; for example, cancer patients who 
don’t die of CVD would be expected to die from 
their cancer by 5 months; that is, cancer survival 
is unlikely after 5 months. 

The main point of the above illustration is that 
when there are competing risks, the KM survival 
curve may not be very informative because it is 
based on an independence assumption about com¬ 
peting risks that cannot be verified. 


This has led to alternative approaches to KM for 
competing risk data. One such alternative, called 
the Cumulative Incidence Curve (CIC), involves 
the use of marginal probabilities as introduced 
above. (Kalbfleisch and Prentice, 1980) 

In the simplest case, if there is only one risk, the 
CIC is (1 — KM). With competing risks, how¬ 
ever, the CIC is derived from a cause-specific haz¬ 
ard function, provides estimates of the “marginal 
probability” of an event in the presence of compet¬ 
ing events, and does not require the assumption 
that competing risks are independent. 
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Marginal probabilities 
Useful to assess treatment utility 
in cost-effectiveness analyses 


for example, 0.01 (5-month) 
marginal probability for Cancer 

(Table 9.6) 


Such marginal probabilities are relevant to clini¬ 
cians in cost-effectiveness analyses in which risk 
probabilities are used to assess treatment utility. 
For example, the .01 (5-month) marginal proba¬ 
bility for cancer derived from hypothetical data 
in Table 9.6 illustrates small treatment utility for 
cancer. 


Steps to Construct CIC 


1. Estimate hazard at ordered 
failure times tj for event type (c) 
of interest 



m r 


where 

m C j = # of events for event type 
c at time tj 

nj = # of subjects at risk at time 

h 


2. Estimate 

S(tj_i) = overall survival 
probability of 
surviving previous 
time (tj_i) 

overall => subject survives 

all other competing 
events 


3. Compute estimated incidence of 
failing from event-type c at 
time tj 

fc(tj) = S(tj _ i) x h c (tj) 

4. CIC(tj) = 4 =1 i c (tj,) 

= £f=i S(fr-i)h c (tj,) 


How does one construct a CIC? We first estimate 
the hazard at ordered time points tj when the 
event of interest occurs. This hazard estimate is 
simply the number of events that occur at tj di¬ 
vided by the number at risk at tj (analogous to the 
KM estimate). We can write this as h c (tj) = m C j/nj 
where the m C j denotes the number of events for 
risk c at time tj and nj is the number of subjects at 
that time. Thus, at any particular time, m c j/nj is 
the estimated proportion of subjects failing from 
risk c. 


To be able to fail at time tj, the subject needs to be 
“around to fail”; that is, he must have survived the 
previous time when a failure occurred. The prob¬ 
ability of surviving the previous time tj_i is de¬ 
noted S(tj_i), where S(t) denotes the overall sur¬ 
vival curve rather than the cause-specific survival 
curve S c (t). We must consider “overall” survival 
here, because the subject must have survived all 
other competing events. 

The probability (i.e., incidence) of failing from 
event-type c at time tj is then simply the proba¬ 
bility of surviving the previous time period multi¬ 
plied by h c (tj). 


The cumulative incidence at time tj is then the cu¬ 
mulative sum up to time tj, (i.e., from j' = 1 to 
j' = j) of these incidence values over all event-type 
c failure times. 
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We illustrate the calculation of a CIC through an 
example. 

Consider another hypothetical study involving 
24 individuals receiving radiotherapy (XRT) for 
the treatment of head and neck cancer. Patients 
Survival Time in Months may die of the disease (cancer), other causes, or 

still be alive at the time of analysis. 

Died of disease: 0.7, 3, 4.9, 6, 6, 6.9, 

10, 10.8, 17.1, 20.3 The data are shown at the left. 

Died of other causes: 1.5, 2.8, 3.8,4.7, 

7, 10, 10, 11.2 

Censored: 3.2, 7.6, 10, 11, 15, 24.4 


EXAMPLE OF CIC CALCULATION 
(ANOTHER HYPOTHETICAL STUDY 


• n = 24 subjects 

• All subjects receive treatment XRT 
for head and neck cancer 


Table 9.9. CIC Calculation— 


Hypothetical Data 


tj 

n j 

m i 

hca(tj) 

Vi> 

ka(tj) 

CIC(tj) 

0 

24 

0 

0 

— 

0 

0 

0.7 

24 

1 

0.042 

1.000 

0.042 

0.042 

1.5 

23 

0 

0 

0.958 

0 

0.042 

2.8 

22 

0 

0 

0.916 

0 

0.042 

3.0 

21 

1 

0.048 

0.875 

0.042 

0.084 

3.2 

20 

0 

0 

0.833 

0 

0.084 

3.8 

19 

0 

0 

0.833 

0 

0.084 

4.7 

18 

0 

0 

0.789 

0 

0.084 

4.9 

17 

1 

0.059 

0.745 

0.044 

0.128 

6 

16 

2 

0.125 

0.702 

0.088 

0.216 

6.9 

14 

1 

0.071 

0.614 

0.044 

0.260 

7.0 

13 

0 

0 

0.570 

0 

0.260 

7.6 

12 

0 

0 

0.526 

0 

0.260 

10 

11 

1 

0.091 

0.526 

0.048 

0.308 

10.8 

7 

1 

0.143 

0.383 

0.055 

0.363 

11.0 

6 

0 

0 

0.328 

0 

0.363 

11.2 

5 

0 

0 

0.328 

0 

0.363 

15 

4 

0 

0 

0.262 

0 

0.363 

17.1 

3 

1 

0.333 

0.262 

0.087 

0.450 

20.3 

2 

1 

0.5 

0.175 

0.088 

0.537 

24.4 

1 

0 

0 

0.087 

0 

0.537 


The calculation of the CIC for these data is shown 

in Table 9.9. 

From the table, we can see that the highest CIC 
probability of 0.537 is reached when t = 20.3 
weeks when the last observed event occurred. 
Thus, the cumulative risk (i.e., marginal probabil¬ 
ity) for a cancer death by week 20 is about 53.7% 
when allowing for the presence of competing risks 
for CVD and Other Deaths. 



Cumulative Incidence Curve 


Because the CIC curve describes “cumulative in¬ 
cidence,” a plot of the curve starts at 0 when t = 0 
and is a nondecreasing function up until the latest 
time of individual follow-up (t = 24.4). 
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CIC Summary 

• Gives marginal probability 

• Does not use product limit 
formulation 

• Not included in mainstream 
commercially available 
statistical packages (e.g., SAS, 
STATA, SPSS) 

Independence of competing risks 
not required for CIC approach 


Nevertheless, CIC requires 

h(t) = h c i(t) + h C 2(t) + • • • + h c k(t) 

where 

h(t) = overall hazard 

h c (t) = hazard for event-type c 

Note: satisfied if 

• Mutually exclusive event-types 

• Non-recurrent events 

Comparing CICs for 2 or more 
groups: 

• Statistical test available (Gray, 
1988) 

• Analogous to log rank test 

• No independence assumption 

• Does not adjust for covariates 



Months 

Cumulative Incidence Curves - 
Byar Data 


Thus, as the example illustrates, the “marginal 
probability” estimated by the CIC does not use 
a product-limit (i.e., KM) formulation. Moreover, 
the computation of a CIC is currently not included 
in mainstream commercially available statistical 
packages. 


As mentioned earlier, the assumption of indepen¬ 
dent competing risks is not required for the calcu¬ 
lation of the CIC, in contrast to the KM survival 
curve, which requires this assumption. 

Nevertheless, the CIC does require that the overall 
hazard is the sum of the individual hazards for 
all the risk types (Kalbfleisch and Prentice, 1980). 
The latter assumption will be satisfied, however, 
whenever competing risks are mutually exclusive 
and events are nonrecurrent; that is, one and only 
one event can occur at any one time and only once 
over time. 


Gray (1988) developed a test to compare two 
or more CICs. This test is analogous to the log 
rank test. The independence assumption is not 
required. However, this test does not adjust for 
covariates. 


The plot shown at the left gives the CICs for the 
two treatments for the Byar data. 
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Gray’s test results: x 2 = 6.6, df = 1 
p-value: 0.01 


PH model used to obtain CIC 

Independence of competing risks re¬ 
quired 

(but CIC meaningful for treatment 
utility) 


Modeling CIC with covariates using 
PH model: Fine and Gray (1999) 

(CIC also called subdistribution 

function) 

Software available (Gebski, 1997) 


Using Gray’s test to compare the two CICs shown 
in the plot, we find the two curves to be signifi¬ 
cantly different (P = 0.01). 

So far, we have described the CIC without consid¬ 
ering (Cox PH) models that account for covariates. 
However, when a PH model is used to obtain haz¬ 
ard ratio estimates for individual competing risks 
as an intermediate step in the computation of a 
CIC, the independence of competing risks is 
required. In any case, the CIC has a meaningful 
interpretation in terms of treatment utility regard¬ 
less of whether competing risks are independent. 

Fine and Gray (1999) provide methodology for 
modeling the CIC with covariates using a propor¬ 
tional hazards assumption. They refer to the CIC 
curves as subdistribution functions. The mathe¬ 
matical details of these methods are beyond the 
scope of this text but software is available that al¬ 
lows for such models to be fitted (Gebski, 1997) 


Fine and Gray model analogous to 
Cox PH model 

Effects of predictors (e.g., HRs) 
have similar interpretation 


The CIC models developed by Fine and Gray are 
analogous to the Cox PH model but, for any fail¬ 
ure type, they model a CIC. The results from fitting 
these models have a similar interpretation regard¬ 
ing the effects of predictors in the model as can 
be derived from the (standard) Cox PH model ap¬ 
proach for competing risks data. 


Table 9.10. Edited Output for 
Cancer with CVD and Other 
Censored—Byar Data (Fine and 
Gray CIC Approach) 


Var 

DF 

Coef 

Std.En: 

p>|z| 

Haz.Ratio 

Rx 

1 

-0.414 

0.171 

0.008 

0.661 

Age 

1 

-0.112 

0.145 

0.221 

0.894 

Wt 

1 

0.088 

0.146 

0.274 

1.092 

PF 

1 

0.126 

0.260 

0.313 

1.135 

Hx 

1 

-0.256 

0.182 

0.080 

0.774 

HG 

1 

0.321 

0.191 

0.046 

1.379 

SZ 

1 

0.841 

0.207 

0.001 

2.318 

SG 

1 

1.299 

0.198 

0.001 

3.665 


For the Byar data, the fitted CIC model that fo¬ 
cuses on cancer deaths as the event-type of inter¬ 
est is shown in Table 9.10 below which we have 
repeated Table 9.1, which uses the standard com¬ 
peting risks Cox PH model approach. 


-2 LOG L = 1662.766546 
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Table 9.1. (Repeated). Edited 
Output for Cancer with CVD and 
Other Censored (Standard Cox PH 
Approach) 


Var 

DF 

Coef 

Std.Err. 

p>|z| 

Haz.Ratio 

Rx 

1 

-0.550 

0.170 

0.001 

0.577 

Age 

1 

0.005 

0.142 

0.970 

1.005 

Wt 

1 

0.187 

0.138 

0.173 

1.206 

PF 

1 

0.253 

0.262 

0.334 

1.288 

Hx 

1 

-0.094 

0.179 

0.599 

0.910 

HG 

1 

0.467 

0.177 

0.008 

1.596 

SZ 

1 

1.154 

0.203 

0.000 

3.170 

SG 

1 

1.343 

0.202 

0.000 

3.830 


Log likelihood = —771.174 


Although corresponding coefficient estimates and 
standard errors are different in the two outputs, 
both outputs are reasonably similar. 



Fine and 

Standard 


Gray CIC 

Cox PH 


(Table 9.10) 

(Table 9.1) 

i^Rx- 

-0.414 

-0.550 

HRrx: 

0.66t (= 

0.577(= 

P-value: 

0.008 

0.001 


For example, the estimated coefficient of Rx is 
—0.414 in Table 9.10 versus —0.550 in Table 9.1. 
The corresponding hazard ratio estimates (e 13 ) are 
.661 (=1/1.513) and 0.577 (=1/1.733), respectively, 
so that the strength of the association is slightly 
weaker using the Fine and Gray approach for 
these data, although both hazard ratios are highly 
significant. 


VII. Conditional Probability 
Curves (CPC) 

A third measure of failure risk: CPC 
(Other measures: 1 — KM and CIC) 

CPC c = Pr(T c < 11 T > t) 

where, T c = time until event c 
occurs 

T = time until any 
competing risk 
risk event occurs 
for example, Byar data 

CPCp C = Pr(T pc < 24 | T > 24) 
where pc = prostate cancer 


Another approach to competing risks is called 
the Cumulative Conditional Probability or CPC. 

CPCs provide a third summary measure, in addi¬ 
tion to (1 minus KM) and CIC, of the risk of failure 
of an event in the presence of competing risks. Put 
simply, the CPC c is the probability of experiencing 
an event c by time t, given that an individual has 
not experienced any of the other competing risks by 
time t. 

Thus, with the Byar dataset, we may be inter¬ 
ested in the risk of dying of prostate cancer 
at 24 months, given that the subject is alive at 
24 months to experience this event. 


CPC c = CIC c /(l - CICcO 

where CIC c ' = CIC from risks other 
than c 


For risk type c, the CPC is defined by CPC c = 
CIC c /(l - CIC c '), where CIC c ' is the cumulative 
incidence of failure from risks other than risk c 
(i.e., all other risks considered together). 
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Graphs of CPCs obtained from CICs 

Tests to compare CPCs: 

Pepe and Mori (1993)—2 curves 
Lunn (1998)—g curves 


Graphs of CPC curves can be obtained from CIC 
curves and have been studied by Pepe-Mori (1993) 
and Lunn (1998). Pepe-Mori provide a test to com¬ 
pare two CPC curves. Lunn (1998) extended this 
test to g-groups and allows for strata. 



Months 


Byar Data: Cumulative Conditional 
Probability 


For the Byar data, the plot shown here gives 
the CPC curves comparing the two DES treat¬ 
ments. These curves give the probability of an 
event (death) from prostate cancer at any particu¬ 
lar time given that the patient is alive at this time 
to experience the event. 

(Note: the Fine and Gray approach has not been 
extended to model the CPCs in a regression frame¬ 
work.) 


Test for equality: p-value = .01 The Pepe-Mori test shows a significant difference 
(Pepe-Mori) between these two CPC curves. 


VIII. Method 2 —The Lunn- 
McNeil (LM) Approach 


Method 1: separate estimates for 
each failure type, treating the com¬ 
peting failure types as censored 


We have previously (Section IV) described an ap¬ 
proach (called Method 1) for analyzing competing 
risks data that uses the Cox (PH) model to sep¬ 
arately estimate hazards and corresponding haz¬ 
ard ratios for each failure type, treating the other 
(competing) failure types as censored in addition 
to those not failing from any event-type. 


Method 2: LM Approach 

• Uses a single Cox (PH) model 

• Gives identical results as 
obtained from Method 1 

• Allows flexibility to perform 
statistical inferences not 
available from Method 1 


We now describe Method 2, called the Lunn- 
McNeil (LM) approach, that allows only one Cox 
PH model to be fit rather than separate mod¬ 
els for each event-type (i.e., Method 1 above). 
This approach, depending on the variables put in 
the model, can give identical results to those ob¬ 
tained from separate models. Moreover, the LM 
approach allows flexibility to perform statistical 
inferences about various features of the competing 
risk models that cannot be conveniently assessed 
using Method 1. 
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Table 9.11. Augmented Data for 
ith Subject at Time t; Using LM 
Approach 


Subj Stime 

Status 

D, 

d 2 

d 3 . 

..D C 

Xi. 

..Xp 

i tj 

ei 

1 

0 

0 . 

.. 0 

Xii. 

..Xip 

i tj 

e2 

0 

1 

0 . 

.. 0 

Xi,. 

..Xip 

i tj 

e3 

0 

0 

1 . 

.. 0 

Xi,. 

..Xip 

i tj 

ec 

0 

0 

0 . 

.. 1 

Xi,. 

..Xip 


D1.D2.D3.Dc: indicators for event-types 


Definition 

D c equals 1 for event-type c 
and 0 otherwise, c = 1,2,_C 


for example, 

Event-type 1: Di = 1, 

D2 = 0, D3 — 0,..., Dc = 0 
Event-type 2: Di = 0, D 2 = 1, 
D3 = 0,..., Dc = 0 
Event-type 3: Dj = 0, D 2 = 0, 
D 3 = 1,...,D C = 0 


Table 9.12. Augmented Data for 
Subjects 1, 14, 16, and 503 from 
Byar Data Using LM Approach 


Subj 

Stime 

Status 

CA 

CVD 

OTH 

Rx 

Age 

Wt 

1 

72 

0 

1 

0 

0 

0 

1 

2 

1 

72 

0 

0 

1 

0 

0 

1 

2 

1 

72 

0 

0 

0 

1 

0 

1 

2 

14 

49 

1 

1 

0 

0 

0 

0 

0 

14 

49 

0 

0 

1 

0 

0 

0 

0 

14 

49 

0 

0 

0 

1 

0 

0 

0 

16 

3 

0 

1 

0 

0 

1 

2 

1 

16 

3 

1 

0 

1 

0 

1 

2 

1 

16 

3 

0 

0 

0 

1 

1 

2 

1 

503 

41 

0 

1 

0 

0 

0 

1 

0 

503 

41 

0 

0 

1 

0 

0 

1 

0 

503 

41 

1 

0 

0 

1 

0 

1 

0 


To carry out the LM approach, the data layout 
must be augmented. If there are C competing risks, 
the original data must be duplicated C times, one 
row for each failure type as shown in Table 9.11 
for the ith subject with survival time t; in the ta¬ 
ble. Also, C dummy variables Di, D 2 , D 3 , ..., Dc 
are created as shown in the table. The value of the 
status variable e c , with c going from 1 to C, equals 
1 if event type c occurs at time c, and equals 0 
if otherwise. The Xs in the table denote the pre¬ 
dictors of interest and, as shown in the table, are 
identical in each row of the table. 

The dummy variables Di, D 2 , D 3 , ... ,D C are indi¬ 
cators that distinguish the C competing risks (i.e., 
event-types). 

Thus, the dummy variable D c equals 1 for event- 
type c and 0 otherwise. 


For example, for event type 1, the Ds are Dj = 1, 
D 2 = 0, D 3 = 0,..., Dc = 0; for event-type 2, the 
Ds are Di = 0, D 2 = 1, D 3 = 0,..., Dc = 0; and 
for event-type 3, the Ds are Di = 0, D 2 = 0, D 3 = 
1. • • • > D c = 0. 


Table 9.12 shows observations for subject #s 1,14, 
16, and 503 from the Byar dataset. The CA, CVD, 
and OTH columns denote the C = 3 dummy vari¬ 
ables D], D 2 , and D 3 , respectively. The last three 
columns, labeled Rx, Age, and Wt give values for 
three of the eight predictors. 

In this table, there are three lines of data for each 
subject, which correspond to the three competing 
risks, Cancer death, CVD death, and Other death, 
respectively. The survival time (Stime) for subject 
1 is 72, for subject 14 is 49, for subject 16 is 3, and 
for subject 503 is 41. 
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From the Status and Event (i.e., CA, CVD, OTH) 
columns, we can see that subject 1 was censored, 
subject 14 died of Cancer, subject 16 died of CVD, 
and subject 503 died from Other causes. 

For subject 1, the values for the predictors Rx, Age, 
and Wt, were 0,1, and 2, respectively. These values 
appear identically in the last three columns of the 
three rows for this subject. Similarly, for subject 
16, the predictor values for Rx, Age, and Wt, are 
1 , 2 , and 1 , respectively. 


General Stratified Cox LM Model 

g= 1 , 2 ,...,C 
h*(t,X) = hj g (t) 

x explpjXj + P2X2+ • • • + | 3 pXp 
+ S21D2X4 + 622D2X2+ • • • + 62pD2X p 
+ S31D3X1 + 632D3X2+ • • • + 63pD3X p 
+ ■■■ 

+ SciDcXt + 5 C 2D C X 2 + • • • + 6 Cp D c Xp] 

1st row: predictors Xi, X 2 ,..., X p 
2 nd row: product terms 
D 2 Xi,D 2 X 2 , ...,D 2 X p 

Cth row: product terms 
D C X,, D c X 2 , ..., D c X p 


To use the LM approach with augmented data to 
obtain identical results from fitting separate mod¬ 
els (Method 1), an interaction version of a strati¬ 
fied Cox PH model is required. A general form for 
this model based on the notation used for the col¬ 
umn heading variables in Table 9.11 is shown at 
the left. 

Recall that the Xi, X 2 ,..., X p denote the p predic¬ 
tors of interest. D 2 , D 3 ,..., Dc are C — 1 dummy 
variables that distinguish the C event-types. Note 
that event-type 1 (g = 1 ) is the referent group, so 
variable Di is omitted from the model. Thus, the 
first row in the exponential formula contains the 
Xs, the second row contains product terms involv¬ 
ing D 2 with each of the Xs, and so on, with the last 
(Cth) row containing product terms of D c with 
each of the Xs. The strata (g = 1,..., c) are the 
C event-types. 


LM Hazard Model for 
Event-Type 1 

hi(t,X) = h*,(t) 

x exp[(3[Xi + |3 2 X 2 + • • • + (3 p X p ] 
(D 2 = D 3 = • • • = Dc = 0) 


For event-type 1 (g = 1), the above stratified Cox 
model simplifies to the expression shown at the 
left. Note that because g = 1, the values of the 
dummy variables D 2 , D 3 , ..., D c are D 2 = D 3 = 
■ ■ ■ = D c = 0. 
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No product terms in model 
HR g= i(X! = 1 vs.Xi = 0) = exp[(3j] 

Product terms XjXi in model 

HR g= i(X! = 1 vs. Xj = 0) 

= exp[(3j + X (3jXj] 

(product terms XjXi in the model) 

LM Hazard Model for 
Event-Type g (>1) 

hg(t,X) = ho g (t) 

xexp[p!Xi + P2X2+ • • • + | 3 pX p 

+ 5giXj+ 6 g 2 X 2 + • • • + 6gpX p ] 
= hQ g (t)exp[(Pj + S g i)Xi+ (p 2 + 6g2)X2 
+ • • • + (Pp + 5gp)X p ] 

No product terms XjXi in the model 
HR g (Xi = 1 vs. Xi = 0) 

= exp[( |3! + 6 g i)] 

Product terms XjXi in the model 
HR g (Xi = 1 vs. Xi = 0) 

= exp[( (3! + 6 g i) 

+ X((3j + SgjXj)] 



Thus, for g = 1, if Xi is a (0,1) variable, the other 
Xs are covariates, and there are no product terms 
XjXi in the model, the formula for the HR for the 
effect of Xi adjusted for the covariates is exp| [3,]. 
The more general exponential formula described 
in Chapter 3 would need to be used instead to ob¬ 
tain adjusted HRs if there are interaction terms in 
the model of the form XjXi. 


For any g greater than 1, the general hazard model 
simplifies to a hazard function formula that con¬ 
tains only those product terms involving the sub¬ 
script g, because D g = 1 and D g - = 0 for g' not 
equal to g. 

With a little algebra we can combine coefficients 
of the same predictor to rewrite this hazard model 
as shown here. 

Thus, for g > 1, if Xi is a (0,1) variable, the other 
Xs are covariates, and there are no product terms 
XjXi in the model, the formula for the HR for the 
effectofXi adjusted for the covariates is exp[|3j + 
5 g i].This HR expression would again need to be 
modified if the model contains product terms of 
the form XjXi. 


We now illustrate the above general LM model for¬ 
mation using the Byar data. 

Recall that using Method 1, the separate models 
approach, the Cox hazard formula used to fit a sep¬ 
arate model for Cancer deaths, treating CVD and 
Other deaths as censored is repeated here. 

Also shown is the formula for the hazard ratio for 
the effect of the Rx variable, adjusted for other 
variables in the model. 
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LM SC Model for Bijar Data 

g= 1,2,3 
hg(t,X) = h 0g (t) 

x expfpjRx + |3 2 Age -I- 1 - |3 8 SG 

+ S 2 iD 2 Rx + S 2 2 D 2 Age + • • • + 6 28 D 2 SG 
4- 531 D 3 RX 4 - 6 32 D 3 Age 4- • • • 4- 5 38 D 3 SG] 

1 st row: predictors 
Rx, Age, Wt, PF, ..., SG 
2 nd row: products 
D 2 RX, D 2 Age,..., D 2 SG 
3rd row: products 
D 3 RX, D 2 Age,..., D 3 SG 

D 2 = CVD and D 3 = OTH are (0,1) 
dummy variables that distinguish 
the 3 event-types 


Using the general LM data layout given in Table 
9.11, the stratified Cox LM model for the Byar data 
that incorporates C = 3 event-types is shown at 
the left. The strata, denoted by g = 1, 2, 3, identify 
the three event-types as Cancer, CVD, and Other, 
respectively. 

Notice that in the exponential part of the model, 
there are 3 rows of terms that correspond to the 3 
event-types of interest. The first row contains p = 8 
predictors Rx, Age, Wt, PF, HX, HG, SZ , SG. The 
second row contains product terms of the dummy 
variable D 2 (the CVD indicator) with each of the 8 
predictors. Similarly, the third row contains prod¬ 
uct terms of D 3 (the Other indicator) with each of 
the predictors. 


HR Ca (Rx = 1 vs. Rx = 0) = exp[ (3 3 ] 
HRcvd(Rx = 1 vs. Rx = 0) 

= exp[( (3 3 + 621 )] 

HRoth(Rx = 1 vs. Rx = 0) 

= exp[( (3 1 + 631 )] 


From the above model, it follows that the hazard 
ratio formulas for the effects of Rx corresponding 
to each event-type are as shown at the left. Notice 
that for CVD and Other deaths, the coefficient 6 g i 
of the product term D g Rx, g = 2, 3, is added to the 
coefficient (3 3 of Rx in the exponential term. 


Table 9.13. Edited Output for LM 
Model (No-Interaction SC)-Byar 
Data 


Var DF Coef 

Std.Err. 

p>|z| 

Haz.Ratio 

Rx 1 

1 -0.550 

0.170 

0.001 

0.577 

Age 1 

1 0.005 

0.142 

0.970 

1.005 

Wt 1 

l 0.187 

0.138 

0.173 

1.206 

PF 1 

0.253 

0.262 

0.334 

1.288 

Hx 1 

1 -0.094 

0.179 

0.599 

0.910 

HG 1 

0.467 

0.177 

0.008 

1.596 

SZ 1 

[ 1.154 

0.203 

0.000 

3.170 

SG 1 

i 1.343 

0.202 

0.000 

3.830 

RxCVD 1 

1 0.905 

0.244 

0.000 

2.471 

AgeCVD 1 

l 0.332 

0.196 

0.089 

1.394 

WtCVD 1 

l -0.146 

0.203 

0.472 

0.864 

PFCVD 1 

l 0.222 

0.377 

0.556 

1.248 

HxCVD 1 

1 1.236 

0.259 

0.000 

3.441 

HGCVD 1 

-0.449 

0.268 

0.094 

0.638 

SZCVD 1 

1 -1.375 

0.417 

0.001 

0.253 

SGCVD 1 

1 -1.366 

0.275 

0.000 

0.255 

RxOth 1 

[ -0.028 

0.327 

0.932 

0.972 

AgeOth 1 

0.764 

0.248 

0.002 

2.147 

WtOth 1 

l 0.344 

0.265 

0.194 

1.411 

PFOth 1 

1 0.288 

0.497 

0.562 

1.334 

HxOth 1 

l 0.117 

0.337 

0.727 

1.125 

HGOth 1 

i - 0.111 

0.345 

0.748 

0.895 

SZOth 1 

i - 0.439 

0.470 

0.350 

0.645 

SGOth 1 

l -1.797 

0.360 

0.000 

0.166 


Table 9.13 shows edited output obtained from fit¬ 
ting the above LM model. 

The first eight rows of output in this table are 
identical to the corresponding eight rows of out¬ 
put in the previously shown Table 9.1 obtained 
from Method 1, which fits a separate model for 
Cancer deaths only. This equivalence results be¬ 
cause the first eight rows of the LM output cor¬ 
respond to the reduced version of the LM model 
when f) 2 = D 3 = 0, which identifies Cancer as the 
event of interest. 

However, the remaining 16 rows of LM output 
are not identical to the corresponding 8 rows of 
Table 9.2 (for CVD) and 8 rows of Table 9.3 (for 
Other). Note that the remaining 16 coefficients in 
the LM output identify the Sgj coefficients in the 
LM model rather than the sum ((3 3 + 6 gj) required 
for computing the HR when g = 2 and 3. 


log likelihood = —1831.92 
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//7?ca(Rx = 1 vs. Rx = 0) 

= exp[-0.550] = 0.577 
= (1/1.733) 

Wald ChiSq = (—.550/. 171) 2 
= 10.345(P = 0.001) 


From Table 9.13, the adjusted HR for the effect of 
Rx when the event-type is Cancer can be read di¬ 
rectly off the output as 0.577. Also, the Wald statis¬ 
tic for testing Ho: (3 t = 0 is highly significant (P = 
.001). The corresponding 95% confidence interval 
for this HR has the limits (0.413, 0.807). 


95% Cl for exp[(3 1Ca ]: 
exp[-0.550 ± 1.96(0.171)] 

= (0.413, 0.807) 

LM results for Cancer identical to These results jure identical to those obtained for 
Method 1 results for Cancer the adjusted HR, the Wald test, and interval esti¬ 

mate obtained in Table 9.1 using Method 1 to as¬ 
sess the effect of Rx on survival for cancer death. 


HRcvd(Rx = 1 vs. Rx = 0) 
= exp( |3 j + §n) 

= exp(-0.550 + 0.905) 
= exp(0.355) = 1.426 

H7?oth(Rx = 1 vs. Rx = 0) 
= exp( (3 [ + S 2 i) 

= exp(-0.550 - 0.028) 
= exp(-0.578) = 0.561 


Using Table 9.13 to obtain adjusted HR for the 
Rx effect when the event-type is CVD or Other, 
we must exponentiate the sum ((3j + 6 g i)for g = 
2 and 3, respectively. 


LM results for CVD and Other iden- These results are shown at the left, and they are 
tical to Method 1 results for CVD identical to those obtained in Tables 9.2 and 9.3 
and Other using Method 1. 


Wald test statistics for CVD and 

Other 


WaldcvD 


Pi + 611 

SEs e 

PI+62I 


Waldom 


Pi + 611 

SEs * 

pi+021 


Note, however, that using the LM model to obtain 
Wald test statistics and 95% confidence intervals 
for the HRs for CVD and Other, the mathemat¬ 
ical formulas (shown at left for the Wald tests) 
require obtaining standard errors of the sums 
(P! + 5 g i) for g = 2 and 3, whereas the output in 
Table 9.13 gives only individual standard errors 
of Pi, S 11 , and 621 . 


Computer packages provide for SAS and STATA provide special syntax to specify 
computation of the above formulas the computer code for such computations: 

SAS’s PHREG allows a “test” statement; STATA 
SAS: test statement allows a “lincom” command. 


STATA: lincom command 
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Alternative LM formulation (LM a i t 
model) 

Output identical to Method 1 
(Tables 9.1, 9.2, 9.3) 


Nevertheless, there is an alternative version of the 
LM model that avoids the need for special syn¬ 
tax. This alternative formulation, which we call 
the LM a i t model, results in output that is identical 
to the output from the separate models (Method 1) 
approach for analyzing competing risk data as 
given in Tables 9.1 through 9.3. 


IX. Method 2a—Alternative 
Lunn-McNeil (LM a i t ) 
Approach 

• Uses same data layout as 

Table 9.11 

• Column headings: 

o Dummy variables 
Di, D2,..., Dc 
o Predictor variables 
Xi,X 2 ,...,Xp 

• Above variables are transformed 
into product terms 


The data layout required to fit the LM a i t model 
is the same as shown earlier in Table 9.11. How¬ 
ever, the variables listed in the columns of this ta¬ 
ble, namely, the dummy variables Di, D 2 , ..., Dc 

and the predictor variables Xi, X 2 ,_X p , serve 

as basic variables that are transformed into prod¬ 
uct terms that define the LM a i t model. 


1 st row of LM a h model: 
product terms 
DiX 1 ,D 1 X 2 ,...,D 1 X p 
coefficients 

1st row of LM model 
predictors Xi, X 2 ,..., X p 
coefficients (3 l5 ..., |3 p 


The primary difference in the two formulas 
is that the first row of the exponential term 
in the LM a i t model contains product terms 
D 1 X 1 , DiX 2 , .... DiXp with coefficients denoted 
Sj j,..., Sjp whereas the first row in the LM 
model contains the predictors Xi, X 2 ,..., X p 
without product terms and coefficients denoted 

Pi. • • •. P p - 


General Stratified Cox 
LM a | t Model 

g=i.c 

hg(t.X) = h„ g (t) 

x exp[S' n DiXi 4 - 6(21)1X2 + • • ■ + 6' lp DiX p 
+ S(jD 2 Xi + 6(21)2X2-1- ■ ■ ■ + 6( p D2Xp 
+ 6( 1 D 3 X 1 + 6(21)3X2 + ■ ■ ■ + 6( p D 3 X p 


The general form of the LM a i t model is shown at 
the left. We have used a superscript prime (') to dis¬ 
tinguish the hazard model formula for the LM a i t 
model from the corresponding formula for the LM 
model given earlier. 


+ SciDcXi + 6( 2 DcX 2 + • • • + 6 Pp DcX p ] 
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• LM a it and LM models are 
different 

• Estimated regression 
coefficients will not be identical 

• Estimated HRs, test statistics, 
and interval estimates are 
identical 

• Computational formulas are 
different 

LM alt Hazard Model for 
Event-Type I 


hi(t,X) = h'j(t) 

x exp[5|jXi + 5' 12 X 2 + • • • + S' lp X p ] 
(D i = 1, D 2 = D 3 = ■■■ = D c = 0) 
HR g= i(X 1 = lvs.X 1 = 0) = exp[ 6 , 11 ] 

(no products XjXi in model) 

LM HR = exp[|3j] 


LM alt Hazard Model for 
Event-Type g (> I) 


h'|(t,X) = h' g (t) 

x exp[ 6 gl Xi + 6 g 2 X 2 + • • • + dg p Xp] 
(D g = 1, and D g < = 0 for g' / g) 
HR g (Xi = 1 vs. Xj = 0) = expfSgJ 

(no products XjXi in model) 

LM HR = exp[|3j + S g i] 


Because the LM a i t model and the LM model are 
different hazard models, their estimated regres¬ 
sion coefficients will not be identical. Neverthe¬ 
less, when used on the same dataset, the esti¬ 
mated HRs of interest and their corresponding test 
statistics and interval estimates are identical even 
though the formulas used to compute these statis¬ 
tics are different for the two models. 


For g = 1 (i.e., event-type 1), the LM a i t model sim¬ 
plifies to the expression shown at the left. Note that 
because g = 1 , the values of the dummy variables 
are Di = 1, and D 2 = D 3 = • • • = Dc = 0. 


Thus, for g = 1, if Xi is a (0,1) variable, the other 
Xs are covariates, and there are no product terms 
of the form XjXi in the model, the formula for the 
HR for the effect of X, adjusted for the covariates 
is exp[ 6 n]. 

Recall that for the LM model, the corresponding 
HR formula also involved the coefficient of the Xj 
variable, denoted as pi- 

For any g greater than 1, the general hazard model 
simplifies to a hazard function formula that con¬ 
tains only those product terms involving the sub¬ 
script g, because D g = 1 and D g / = 0 for g' / g. 

Thus, for g > 1, if Xi is a (0,1) variable, the other 
Xs are covariates, and there are no products XjXi 
in model, the formula for the HR for the effect of 
Xi adjusted for the covariates is exp[ 6 ' gl ]. 


Recall that for the LM model, the exponential in 
the HR formula involved the sum (pj + 6 g i). 
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Statistical inferences (i.e., Wald test, 
95% Cl) 

LM a h model: need standard error for 
S j (directly provided by output) 

LM model: standard error of 
(Pi + 6 gi)- (more complicated 
computation) 

Next: Byar data example of LM a i t 
model 

LM a | t SC Model for Byar Data 

g = 1,2,3 

hg(t,X) = ho g '(t) 

x expfbjjDiRx + 6 ' 12 DiAge H- 

+ 6 j p DiSG + 5' 21 D 2 Rx 
+ 6 22 D 2 Age + • • • + 6 2 g D 2 SG 
+ 63 JD 3 RX + 6 32 D 3 Age + • • • 
+ 6 ' 38 D 3 SG] 


Dj = CA, D 2 = CVD, and D 3 = 

OTH are ( 0 , 1 ) dummy variables for 
the 3 event-types 

1 st row: products 
DiRx, DiAge,..., DiSG 
(LM predictors, Rx, Age,..., SG) 
2 nd row: products 
D 2 Rx, D 2 Age,..., D 2 SG 
3rd row: products 
D 3 Rx, D 3 Age,..., D 3 SG 


HRca(Rx= 1 vs. Rx = 0) = expfbjj] 
HRcvd(Rx = 1 vs. Rx = 0) 

= exp[ 6 21 ] 

HRoth(Rx = 1 vs. Rx = 0) 

= exp[5 31 ] 


Thus for g > 1, statistical inferences about HRs 
using the LM a i t model only require use of the stan¬ 
dard error for 6 gl that is directly provided in the 
output. 

In contrast, the LM model requires computing 
the more complicated standard error of the sum 

(Pi + Sgi). 


We now illustrate the above general LM a i t model 
formation using the Byar data. 


The stratified Cox (SC) LM a i t model that incorpo¬ 
rates the C = 3 event-types is shown at the left. 
The strata, denoted by g = 1, 2, 3, identify the 
three event-types, Cancer, CVD, and Other. 


Notice that in the exponential part of the model, 
the first row contains product terms of the dummy 
variable Di (the CA indicator) with each of the 8 
predictors Rx, Age, Wt, PF, HX, HG, SZ, SG. Recall 
that in the LM version of this model, the first row 
contained main effects of the predictors instead of 
product terms. 

The second and third rows, as in the LM model, 
contain product terms of the dummy variable D 2 
(the CVD indicator) and D 3 (the OTH indicator), 
respectively, with each of the 8 predictors. 

From the above model, it follows that the HR for¬ 
mulas for the effects of Rx corresponding to each 
event-type are of the form exp( 6 gl ), where 6 gl is 
the coefficient of the product term D s Rx in the 
LM a it model. 
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Waldg 



g = 1 (CA), 2 (CVD), 3 (OTH) 


Consequently, Wald test statistics (shown at the 
left) and confidence intervals for these HRs use 
standard errors that are directly obtained from the 
standard error column from the output obtained 
for the LM a i t model. 


Statistical inference information 

LM a it model: directly provided by 
output 

LM model: not directly provided by 
output (requires additional 
computer code) 


Thus, the LM a i t model allows the user to perform 
statistical inference procedures using the infor¬ 
mation directly provided in the computer output, 
whereas the LM model requires additional com¬ 
puter code to carry out more complicated compu¬ 
tations. 


Table 9.14. Edited Output for SC 
LM a it Model—Byar Data 


Var DF Coef 

Std.Err. 

p>|z| 

Haz.Ratio 

RxCa 

1 -0.550 

0.170 

0.001 

0.577 

AgeCa 

1 0.005 

0.142 

0.970 

1.005 

WtCa 

1 0.187 

0.138 

0.173 

1.206 

PFCa 

1 0.253 

0.262 

0.334 

1.288 

HxCa 

1 -0.094 

0.179 

0.599 

0.910 

HGCa 

1 0.467 

0.177 

0.008 

1.596 

SZCa 

1 1.154 

0.203 

0.000 

3.170 

SGCa 

1 1.343 

0.202 

0.000 

3.830 

RxCVD 

1 0.354 

0.174 

0.042 

1.429 

AgeCVD 

1 0.337 

0.134 

0.012 

1.401 

WtCVD 

1 0.041 

0.150 

0.783 

1.042 

PFCVD 

1 0.475 

0.270 

0.079 

1.608 

HxCVD 

1 1.141 

0.187 

0.000 

3.131 

HGCVD 

1 0.018 

0.202 

0.929 

1.018 

SZCVD 

1 -0.222 

0.364 

0.542 

0.801 

SGCVD 

1 -0.023 

0.186 

0.900 

0.977 

RxOth 

1 -0.578 

0.279 

0.038 

0.561 

AgeOth 

1 0.770 

0.204 

0.000 

2.159 

WtOth 

1 0.532 

0.227 

0.019 

1.702 

PFOth 

1 0.541 

0.422 

0.200 

1.718 

HxOth 

1 0.023 

0.285 

0.935 

1.023 

HGOth 

1 0.357 

0.296 

0.228 

1.428 

SZOth 

1 0.715 

0.423 

0.091 

2.045 

SGOth 

1 -0.454 

0.298 

0.127 

0.635 


log likelihood = —1831.916 


Table 9.14 shows edited output obtained from fit¬ 
ting the above LM a i t model. 

The first eight rows of output in this table are iden¬ 
tical to the eight rows of output in the previously 
shown Table 9.1 obtained from Method 1, which 
fits a separate model for Cancer deaths only, treat¬ 
ing CVD and Other deaths as censored. 

The next eight rows in the table are identical to 
the eight rows of output in the previous Table 9.2, 
which fits a separate model for CVD deaths only, 
treating Cancer and Other deaths as censored. 

The last eight rows in the table are identical to the 
eight rows of output in the previous Table 9.3, 
which fits a separate model for Other deaths only, 
treating Cancer and CVD deaths as censored. 


Table 9.14 (LM a i t ) output 
identical to 

Tables 9.1, 9.2, 9.3 (Method 1) 

output combined 


Thus, the output in Table 9.14 using the single 
LM a h model gives identical results to what is ob¬ 
tained from fitting 3 separate models in Tables 
9.1, 9.2, and 9.3. 
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X. Method 1 —Separate 
Models versus Method 
2—LM Approach 

Why bother with LM or LM a i t 
models when you can simply 
fit 3 separate models? 

Answer: Can perform statistical in¬ 
ferences that cannot be done when 
fitting 3 separate models 


The reader may have the following question at this 
point: Why bother with the LM or LM a it models 
as long as you can get the same results from fit¬ 
ting three separate models using Method 1 ? The 
answer is that the LM or LM a i t model formulation 
allows for performing statistical inferences about 
various features of the competing risk models that 
cannot be conveniently assessed when fitting three 
separate models using Method 1. 


LM Model for Bijar Data 

g= 1,2,3 

h*(t,X) = h* g (t) 

x expfpjRx + |3 2 Age -I-1- (3 8 SG 

+ §21 D 2 Rx + 6 22 D 2 Age + • • • + 6 2g D 2 SG 
+ § 3 i D 3 Rx + § 32 D 3 Age + • • • + § 3 8 D 3 SG] 


We illustrate such “extra” inference-making using 
the LM model previously described for the Byar 
data example. This model is shown again at the 
left. Equivalent inferences can be made using the 
LM a u model (see Exercises at end of this chapter). 


Inference question: Byar data One inference question to consider for the Byar 

data is whether a no-interaction SC LM model 
No-interaction SC LM model is more appropriate than the interaction SC LM 

versus model defined above, 

interaction SC LM model 


No-interaction SC model 

g = 1,2,3 
hg(t,X) = h* g (t) 

xexp[(3!Rx+ |3 2 AgeH-b |3 8 SG] 


The no-interaction SC model is shown here at the 
left. 


Assumes 

HRca(Xi) = HlWXi) 

= HRothCX;) 

= HR(Xi) for any X ; variable 

for example, Rx = 0 vs Rx = 1: 
HRca(Rx) = HRcvd(Rx) 

= HRoth(Rx) 

= expfPJ 


This model assumes that the hazard ratio for the 
effect of a single predictor (say, binary) Xj adjusted 
for the other variables in the model is the same for 
each event-type of interest. 

For example, in the above no-interaction SC LM 
model the hazard ratio for the effect of Rx is 
expfPJ for each g, where (3, is the coefficient of 
Rx. 
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Ho: all 6gj = 0, 

g= 2, 3; j = 1, 2-8 

where 6gj is coefficient of D g Xj in the 
interaction SC LM model 


To carry out the comparison of the interaction 
with the no-interaction SC LM models, the null 
hypothesis being tested is that the coefficients of 
the 16 product terms (6gj) in the interaction SC 
model are equal to zero. 


Likelihood Ratio Test 

LR = -2 log L r - (-2 log L f ) 
approx X\b under H 0 
R = no interaction SC (reduced) 
model 

F = interaction SC (full) model 


This null hypothesis is conveniently tested using 
the LM model with a likelihood ratio test statistic 
obtained by subtracting —2 log L statistics from 
the two models being compared. The degrees of 
freedom being tested is 16, the number of 6 S) co¬ 
efficients being set equal to zero under H 0 . 


Table 9.15. Edited Output—No- 
Interaction SC LM Model—Byar 
Data 


Var 

DF 

Coef 

Std.Err. 

P > |z| 

Haz.Ratio 

Rx 

1 

-0.185 

0.110 

0.092 

0.831 

Age 

1 

0.287 

0.087 

0.001 

1.332 

Wt 

1 

0.198 

0.093 

0.032 

1.219 

PF 

1 

0.402 

0.170 

0.018 

1.495 

Hx 

1 

0.437 

0.112 

0.000 

1.548 

HG 

1 

0.292 

0.120 

0.015 

1.339 

SZ 

1 

0.672 

0.159 

0.000 

1.958 

SG 

1 

0.399 

0.115 

0.001 

1.491 


log likelihood = —1892.091 


Table 9.15 gives the output resulting from the no¬ 
interaction SC LM model for the Byar dataset. In 
this table, there is one coefficient corresponding 
to each of the eight predictors in the model, as 
should be the case for a no-interaction SC model. 
Nevertheless, baseline hazard functions hg g (t) are 
allowed to be different for different g even if the 
coefficients are the same for different g. 


Table 9.15: Log likelihood R 
= -1892.091 

Table 9.13: Log likelihoodF 
= -1831.916 


From Table 9.15, we find that the log-likelihood 
statistic for the reduced (no-interaction SC) model 
is —1892.091. From Table 9.13 (or 9.14), the log- 
likelihood statistic for the full (interaction SC) 
model is —1831.916. 


LR = —2 log L r — (—2 log L f ) 

= —2(—1892.091) 

— (—2(—1831.916)) 

= 120.35 approx y p, under H 0 
(P < 0.001) 


The likelihood ratio test statistic (LR) is then cal¬ 
culated to be 120.35, as shown at the left. This 
statistic has an approximate chi-square distribu¬ 
tion with 16 degrees of freedom under H 0 . 


Reject Ho: interaction SC model The P-value is less than .001, which indicates a 
more appropriate than highly significant test result, thus supporting use 

no-interaction SC model of the full-interaction SC model. 
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Cancer and CVD very different 
clinically 

li 

HR Ca (Rx = 1 vs. 0) 

# HRcvd(Rx = 1 vs. 0) 


For the Byar dataset, the decision to reject the no¬ 
interaction SC model makes sense when consid¬ 
ering that two of the competing risks are Cancer 
deaths and CVD deaths. Because Cancer and CVD 
are clinically very different diseases, one would ex¬ 
pect the effect of any of the predictors, particularly 
Rx, on time to failure to be different for these dif¬ 
ferent disease entities. 


DIFFERENT STUDY EXAMPLE 


Competing risks: Stage 1 vs. Stage 2 
Breast Cancer 

11 

HR stgl (Rx = 0 vs. 1) = HR stg2 (Rx = 0 vs. 1) 
Plausible 

11 

Non-interaction SC Cox reasonable 
depending on similarity of competing risks 


Suppose, however, the competing risks for a dif¬ 
ferent study had been, say, two stages of breast 
cancer. Then it is plausible that the effect from 
comparing two treatment regimens might be the 
same for each stage. That is, a no-interaction SC 
LM model may be (clinically) reasonable depend¬ 
ing on the (clinical) similarity between competing 
risks. 


Unstratified LM model (LMu): 
h*(t,X) = h*(t) 
x expfYj CVD + Y2OTH 
+ | 3 ^Rx -f- | 3 2 Age + ■ ■ ■ + ( 3 8 SG 
+ 6 21 D 2 Rx 4- 6 22 D 2 Age 4- • • • 4- 6 *§SG 
+ SjjDgRx + 6 32 D3Age + • • • + 6 j 8 SG] 


Returning again to the Byar data example, another 
variation of the LM model is shown at the left and 
denoted LMu. This is a Cox PH model applied to 
the augmented data of Table 9.11 that is not strat¬ 
ified on the competing risks (i.e., there is no sub¬ 
script g in the model definition). We have used 
a superscript bullet (•) to distinguish the LMu 
model from the LM and LM a i t models. 


LMu model: CVD and OTH 
included in model 

LM model: CVD and OTH not 
included in model 

(Both LMu and LM models use 
augmented dataset) 

LMu model: need to check PH 
assumption (Chapter 4) 

PH assumption not satisfied 

41- 

Use LM instead of LMu model 


The LMu model includes the two event-type 
dummy variables CVD and OTH in the model, 
rather than stratifying on these variables. As for 
the LM model, the fit of the LMu model is based 
on the augmented dataset given in Table 9.11. 


Because the LMu model is an unstratified Cox 
PH model, we would want to use the methods of 
Chapter 4 to assess whether the PH assumption is 
satisfied for the CVD and OTH variables (as well 
as the other variables). If the PH assumption is 
found wanting, then the (stratified Cox) LM model 
should be used instead. 
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PH assumption satisfied 

if 

Determine HRs using exponential 
formula (Chapter 3) 


If the PH assumption is satisfied, hazard ratios for 
the effects of various predictors in the LMu model 
can be determined using the standard exponential 
formula described in Chapter 3 for the Cox PH 
model. 


Cancer survival (CVD = OTH = 0): 
HR Ca (Rx = 1 vs. Rx = 0) = exp[5*j] 
CVD survival (CVD = 1, OTH = 0): 


In particular, to obtain the hazard ratio for the ef¬ 
fect of Rx on Cancer survival, we would specify 
CVD = OTH = 0 in the model and then exponen¬ 
tiate the coefficient of the Rx variable in the model, 
as shown at the left. 


HRcvd(Rx = 1 vs. Rx =0) 

= expfyj + 5‘J 

Other survival (CVD = 0, OTH = 1): 


HRoth(Rx = 1 vs. Rx = 0) 
= expfyj + 6*[] 


Similar HR expressions (but involving yj and y 2 
also) are obtained for the effect of Rx when CVD 
deaths and Other deaths are the event-types of 
interest. 


Essential point 

Use of single LM-type model offers 
greater flexibility for the analysis 
than allowed using Method 1 


At this point, we omit further description of re¬ 
sults from fitting the LMu model to the Byar 
dataset. The essential point here is that the use 
of a single LM-type model with augmented data 
allows greater flexibility for the analysis than can 
be achieved when using Method 1 to fit separate 
hazard models for each event-type of interest. 


XL Summary 


Competing Risks 


Each subject can experience only 
one of several different types of 
events over follow-up 


This chapter has considered survival data in which 
each subject can experience only one of several dif¬ 
ferent types of events over follow-up. The different 
events are called competing risks. 


Typical approach 

• Cox PH model 

• Separate model for each 
event-type 

• Other (competing) event-types 
treated as censored 


We have described how to model competing risks 
survival data using a Cox PH model. The typical 
approach for analyzing competing risks data is 
to perform a survival analysis for each event-type 
separately, where the other (competing) event- 
types are treated as censored categories. 
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Drawbacks 

1. Require independent competing 
risks that is, censored subjects 
have same risk as all subjects in 
risk set 


2. Product-limit (e.g., KM) curve 
has questionable interpretation 


Several alternative strategies re¬ 
garding independence assumption: 
No single strategy is always best 


There are two primary drawbacks to the above 
method. One problem is the requirement that 
competing risks be independent. This assumption 
will not be satisfied if those subjects censored from 
competing risks do not have the same risk for fail¬ 
ing as subjects who are not censored from the 
cause-specific event of interest at that same time. 

A second drawback is that the estimated product- 
limit survival curve obtained from fitting separate 
Cox models for each event-type has questionable 
interpretation when there are competing risks. 

Regarding the independence assumption, several 
alternative strategies for addressing this issue are 
described, although no single strategy is always 
best. 


Sensitivity analysis: worst-case vio¬ 
lations of independence assumption 

For example, subjects censored 
from competing risks treated in 
analysis as if 

• All event-free 

• All experience event of interest 

• Independence assumption not 
easily verifiable 

• Typical analysis assumes 
independence assumption is 
satisfied 


A popular strategy is a sensitivity analysis, which 
allows the estimation of parameters by consider¬ 
ing worst-case violations of the independence as¬ 
sumption. For example, subjects censored from 
competing risks might be treated in the analysis 
as either all being event-free or all experiencing 
the event of interest. 


Unfortunately, the independence assumption is 
not easily verifiable. Consequently, the typical 
competing risks analysis assumes that the inde¬ 
pendence assumption is satisfied even if this is not 
the case. 


CIC Alternative to KM 

• Derived from cause-specific 
hazard function 

• Estimates marginal 
probability when competing 
risks are present 

• Does not require independence 
assumption 

• Useful to assess treatment utility 
in cost-effectiveness analyses 


To avoid a questionable interpretation of the KM 
survival curve, the primary alternative to using 
KM is the Cumulative Incidence Curve (CIC), 
which estimates the marginal probability of an 
event. Marginal probabilities are relevant for as¬ 
sessing treatment utility whether competing risks 
are independent. 
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cic(tj) = Y, i c ( tj ') 

j'=i 

j'=l 

hc(tj') = estimated hazard at ordered 
failure time ty for the event-type (c) 

S(tj'_i) = overall survival probabil¬ 
ity of previous time (ty_i) 

CIC 


• Does not use product limit 
formulation 

• Not included in mainstream 
commercially available 
statistical packages (e.g., SAS, 
STATA, SPSS) 

PH model used to obtain CIC 

41- 

Independence of competing risks re¬ 
quired 


Modeling CIC with covariates using 
PH model: Fine and Gray (1999) 

Software available (Gebski, 1997) 
Fine and Gray model analogous to 
Cox PH model 

Alternative to CIC 

CPC c = Pr(T c < 11 T > t) 

where T c = time until event c 
occurs 

T = time until any 

competing risk event 
occurs 

CPC c = CIC c /(l - CIC c 0 

where CIC c ' = CIC from risks other 
than c 


The formula for the calculating the CIC is shown 
at the left. The h c (tp) in the formula is the esti¬ 
mated hazard at survival time ty for the event- 
type (c) of interest. The term S(ty_i) denotes 
the overall survival probability of previous time 
(ty_i), where “overall survival” indicates a subject 
that survives all competing events. 


As the formula indicates, the CIC is not estimated 
using a product-limit formulation. Also, its com¬ 
putation is not included in mainstream commer¬ 
cially available standard statistical packages. 


If a proportional hazard model is used to obtain 
hazard ratio estimates for individual competing 
risks as an intermediate step in the computation of 
a CIC, the assumption of independent competing 
risks is still required. 

Recent work of Fine and Gray (1999) provides 
methodology for modeling the CIC with covariates 
using a proportional hazards assumption. Soft¬ 
ware is available for this method (Gebski, 1997, 
Tai et ah, 2001), although not in standard com¬ 
mercial packages. 

An alternative to the CIC is the Conditional Prob¬ 
ability Curve (CPC). For risk type c, CPC c is the 
probability of experiencing an event c by time t, 
given that an individual has not experienced any of 
the other competing risks by time t. 


The CPC can be computed from the CIC through 
the formula CPC c = CIC c /(l — CIC c '), where CIC c ' 
is the cumulative incidence of failure from risks 
other than risk c (i.e., all other risks considered 
together). 
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Tests to compare CPCs: Pepe-Mori provide a test to compare two CPC 

curves. Lunn (1998) extended this test to g-groups 
Pepe and Mori (1993)—2 curves and allows for strata. 

Lunn (1998)—g curves 


Method 2: LM Approach 

• Uses a single Cox (PH) model 

• Gives identical results as 
obtained from Method 1 

• Allows flexibility to perform 
statistical inferences not 
available from Method 1 


We have also described an alternative approach, 
called the Lunn-McNeil (LM) approach, for an¬ 
alyzing competing risks data. The LM approach 
allows only one model to be fit rather than sep¬ 
arate models for each event-type (Method 1). 
This method is equivalent to using Method 1. 
The LM model also allows the flexibility to per¬ 
form statistical inferences to determine whether 
a simpler version of an initial LM model is more 
appropriate. 


Augmented Data for ith Subject at 
Time t Using LM Approach 


Subj Stime Status 

Di 

D 2 

Dj. 

■■D c 

X,. 

..X P 

i ti e. 

1 

0 

0 . 

.. 0 

Xii. 

..X ip 

i ti e 2 

0 

1 

0 . 

.. 0 

Xii. 

,.x ip 

i ti e 3 

0 

0 

1 . 

.. 0 

Xii. 

..Xip 

i ti e c 

0 

0 

0 . 

.. 1 

Xu. 

..Xip 


To carry out the LM approach, the data layout 
must be augmented. If there are C competing risks, 
the original data must be duplicated C times, one 
row for each failure type. 


g= 1> 2.C 

h*(t,X) = hS g (t) 

x expfpjXi + P2X2 + • • • + | 3 p Xp 
+ 621D2X1 + S22D2X2 + • • • + 62pD2X p 
+ 631D3X1 + 632D3X2 + • • • + 63pD3X p 
+ ... 

+ SciDcX] + 6 C2 D C X2 + • • • + ScpDcXp] 


To use the LM approach with augmented data to 
obtain identical results from fitting separate mod¬ 
els (Method 1), an interaction version of a strati¬ 
fied Cox (SC) PH model is required. A general form 
for this model is shown at the left. 


LM model: need standard error of 

(Pi + 5 g i) 

(special syntax required for compu¬ 
tation) 


The LM model can be used to obtain Wald test 
statistics and 95% confidence intervals for HRs 
separately for each competing risk. These statis¬ 
tics require obtaining standard errors of the sums 
of estimated regression coefficients (e.g., |3, + 
S g i). Such computations require special syntax 
available in standard computer packages such as 
SAS, STATA, and SPSS. 
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Alternative LM formulation: LM a i t Nevertheless, there is an alternative formation of 
model the LM model that avoids the need for special syn¬ 

tax. This alternative formulation, called the LM a i t 
LM a it yields output identical to model, yields output that is identical to the output 
Method 1 from the separate models (Method 1) approach for 

analyzing competing risk data. 


1 st row of LM a h model 

Product terms DiXi, D i X 2 ,..., DiX p 
Coefficients 6' u ,... ,6'j 

1st row of LM model 

Predictors Xi, X 2 ,..., X p 
Coefficients |3(3 p 


The primary difference in the two formulas 
is that the first row of the exponential term 
in the LM a i t model contains product terms 

DiXi, DiX 2 ,_DiX p i with coefficients denoted 

Sn.---.Sip whereas the first row in the LM 
model contains the predictors Xi, X 2 ,..., X p 
without product terms and coefficients denoted 

Pi, , P p . 


LMait model: Wald g = 


gi 


SE 


' 6 gi. 


directly obtained from output 


Using the LM a i t model, Wald test statistics (shown 
at the left) and confidence intervals use standard 
errors that are directly obtained from the stan¬ 
dard error column from the output obtained for 
the LMait model. 


Statistical inference information 

LMait model: directly provided by 
output 

LM model: not directly provided by 
output (requires additional 
computer code) 


Thus, the LM a i t model allows the user to perform 
statistical inference procedures using the infor¬ 
mation directly provided in the computer output, 
whereas the LM model requires additional com¬ 
puter code to carry out more complicated compu¬ 
tations. 


Advantage of LM (Method 2) over 
method 1: 

LM offers flexibility for statistical 
inferences to consider simpler 
models 


An advantage of using either the LM or LM a it ap¬ 
proach instead of fitting separate models (Method 
1 ) is the flexibility to perform statistical inferences 
that consider simpler versions of an interaction SC 
LM model. 
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For example, 

No-interaction SC LM model 
versus 

interaction SC LM model 

Unstratified LM model 
versus 

SC LM model 


For example, one inference question to consider is 
whether a no-interaction SC LM model is more ap¬ 
propriate than an interaction SC model. A differ¬ 
ent question is whether an unstratified LM model 
is more appropriate than a stratified LM model. 
These questions can be conveniently addressed us¬ 
ing a single (i.e., LM) model instead of fitting sep¬ 
arate models (Method 1). 


Overall, 

• Can use standard computer 
packages 

• Independence assumption 
required 


Overall, in this chapter, we have shown that com¬ 
peting risks data can be analyzed using standard 
computer packages provided it can be assumed 
that competing risks are independent. 
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Detailed 

Outline 


I. Overview (page 396) 

A. Focus: competing risks —analysis of survival data 
in which each subject can experience only one of 
different types of events over follow-up. 

B. Analysis using Cox PH model. 

C. Drawbacks to typical approach that uses Cox 
model. 

D. Alternative approaches for analysis. 

II. Examples of Competing Risks Data 

(pages 396-398) 

A. Dying from either lung cancer or stroke. 

1. Assess whether lung cancer death rate in 
exposed” persons is different from lung cancer 
death rate in “unexposed,” allowing for 
competing risks. 

2. Also, compare lung cancer with stroke death 
rates controlling for predictors. 

B. Advanced cancer patients either dying from 
surgery or getting hospital infection. 

1. If focus on hospital infection failure, then 
death from surgery reduces burden of hospital 
infection control required. 

C. Soldiers dying in accident or in combat. 

1. Focus on combat deaths. 

2. If entire company dies from accident on way to 
combat, then KM survival probability for 
combat death is undefined. 

3. Example illustrates that interpretation of KM 
curve may be questionable when there are 
competing risks. 

D. Limb sarcoma patients developing local 
recurrence, lung metastasis, or other metastasis. 

1. None of failure types involves death, so 
recurrent events are possible. 

2. Can avoid problem of recurrent events if focus 
only on time to first failure. 

3. Analysis of recurrent events and competing 
risks in same data not addressed. 

III. Byar Data (pages 399-400) 

A. Randomized clinical trial comparing treatments 
for prostate cancer. 

B. Three competing risks: deaths from prostate 
cancer, CVD, or other causes. 


Detailed Outline 441 


C. Covariates other than treatment are Age, Weight 
(Wt), Performance Status (PF), History of CVD 
(Hx), Hemoglobin (Hg), Lesion size (SZ), and 
Gleeson score (SG). 

D. Competing risks considered independent, for 
example, death from CVD independent of death 
from death from cancer. 

IV. Method 1—Separate Models for Different Event 
Types(pages 400-403) 

A. Use Cox (PH) model to estimate separate hazards 
and HRs for each failure type, where other 
competing risks are treated as censored in addition 
to usual reasons for censoring: loss to follow-up, 
withdrawal from study, or end of study. 

B. Cause-specific hazard function: 

h c (t) = limAt^o P(t < T c < t + At|T c > t)/At 
where T c = time-to-failure from event c, c = 1, 

2,..., C (# of event types). 

C. Cox PH cause-specific model (event-type c): 

h c (t,X) = h 0c (t)exp[^ |3 ic Xi] 

i=l 

where c = 1,..., C, and |3 ic allows effect of X; to 
differ by event-type. 

D. Byar data example: Cancer, CVD, Other Deaths are 
C = 3 competing risks. 

1. Cause-specific (no-interaction) model for 

Cancer: 

h Ca (t,X) = hoca(t) exp[|3 1Ca Rx + |3 2Ca Age 
+ p 3 CaWt + |34 Ca PF + (3 5Ca Hx 
+ p6CaHG + |3 7Ca SZ + |3 8Ca SG] 

where CVD and Other deaths treated as 
censored observations 

HR Ca (RX = i vs. RX = 0) = exp[|3 1Ca ] 

2. Separate cause-specific (no-interaction) models 

for CVD and Other. 

3. Edited output presented for each cause-specific 
model: 

a. Cause-specific Cancer results for RX (with 
CVD and Other censored): 

HR Ca (RX = 1 vs. RX = 0) = 0.575 (P = 0.001) 
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b. Cause-specific CVD results for RX (with 
Cancer and Other censored): 

HRcvd(RX = 1 vs. RX = 0) = 1.429 (P = 0.040) 

c. Cause-specific Other results for RX (with 
Cancer and CVD censored): 

HRoth(RX= 1 vs. RX = 0) = 0.560 (P = 0.038) 

V. The Independence Assumption (pages 403-411) 

A. Non-informative (independent) censoring: risk 
for being censored at time t does not depend on 
prognosis for failure at time t. 

1. Typical context: no competing risks; 
homogeneous sample. 

2. Informative censoring can lead to bias results. 

B. Non-informative (independent) censoring with 
competing risks. Any subject in the risk set at 
time t with a given set of covariates is just as likely 
to be censored at time t as any other subject in the 
risk set with the same set of covariates regardless 
of whether the reason for censoring is a competing 
risk, withdrawal from study, or loss to follow-up. 

1. Informative censoring: subject in risk set at 
time t is more likely to be censored from a 
competing risk than other subjects in the risk 
set at the same time. 

2. Homogeneous: subjects in the risk set having 
the same values of the predictors of interest. 

3. Synonym: Competing risks are independent. 

C. Assessing the independence assumption. 

1. No method available to directly assess the 
independence assumption nor guarantee 
unbiased estimates if independence 
assumption is violated. 

2. Consequently, the typical analysis of competing 
risks assumes that the independence 
assumption is satisfied, even if not. 

3. Strategies for considering independence 
assumption 

a. Decide that assumption holds on clinical/ 
biological/other grounds. 

b. Include in your model variables that are 
common risk factors for competing risks. 
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c. Use a frailty model containing a random effect 
that accounts for competing risks. 

d. Perform a sensitivity analysis by considering 
worst-case violations of independence 
assumption. 

e. All of above strategies rely on assumptions 
that cannot be verified from observed data. 

4. Example of sensitivity analysis using Byar data. 

a. Treat all subjects that die of competing risks 
CVD and Other as Cancer deaths. 

b. Treat all subjects that die of competing risks 
CVD and Other as surviving as long as the 
largest survival time in the study. 

c. Results suggest that if competing risks are not 
independent, then conclusions about the effect 
of Rx could be very different. 

d. Alternative sensitivity approach: randomly 
select a subset (e.g., 50%) of subjects who have 
CVD or Other deaths and assume everyone in 
subset dies of Cancer. 

VI. Cumulative Incidence Curves (CIC) 

(pages 412-420) 

A. Hypothetical study: n = 100 subjects, all subjects 
with prostate cancer 

Survt (months) # Died Cause 


3 

5 


99 CVD 
1 Cancer 


Study goal: cause-specific Cancer survival 
Censored: CVD deaths 

KM Ca : Sca(t = 5) = 0 and Risk Ca (T < 5) = 1 
B. How many of 99 deaths from CVD would have 
died from Cancer if not dying from CVD? 

1. No answer is possible because those with CVD 
deaths cannot be observed further. 

2. Sensitivity analysis A: 99 CVD deaths die of 
Cancer at t = 5. 

a. KM Ca : S Ca (t = 5) = 0 and Risk Ca (T < 5) = 1 
because KM assumes noninformative 
censoring; that is, those censored at t = 3 
were as likely to die from cancer at t = 5 as 
those who were in the risk set at t = 5. 

b. Same KM result as obtained for actual data. 
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3. Sensitivity analysis B: 99 CVD deaths survive 
past t = 5. 

a. KM Ca : S Ca (t = 5) = 0.99 and Risk Ca (T < 5) = 
0 . 01 . 

b. Different KM result than actual data. 

c. Can be derived directly from actual data as a 

marginal probability. 

4. Main point: KM survival curve may not be very 
informative. 

C. Cumulative Incidence Curve (CIC): alternative to 
KM for competing risks. 

1. Derived from cause-specific hazard function. 

2. Estimates marginal probability. 

3. Does not require independence assumption. 

4. Has a meaningful interpretation in terms of 
treatment utility. 

5. CIC formula: 

CIC(tj) = ^ic(fr) = ^%_i)h c (ty) 
j'=i j'=t 

6. Calculation of CIC for another hypothetical 
dataset. 

7. Tests have been developed (Pepe and Mori, 
1992) for comparing the equality of CICs for 
two or more groups: analogous to log rank test. 

8. When a PH model is used to obtain hazard 
ratio estimates in the computation of a CIC, the 
independence of competing risks is required. 

9. Fine and Gray (1999) provide methodology for 
modeling the CIC (also called subdistribution 
function) with covariates using a proportional 
hazards assumption: analogous to fitting Cox 
PH model. 

10. Example of Fine and Gray output compared 
with Cox PH output for Byar data. 

VII. Conditional Probability Curves (CPC) 

(pages 420-421) 

A. CPC c = Pr(T c < t|T > t) where T c = time until 
event c occurs, T = time until any competing risk 
event occurs. 

B. Formula in terms of CIC: CPC c = CIC c /(l — CIC c 0 
where CIC c ' = CIC from risks other than c. 

C. Graphs of CPCs can be derived from graphs of 
CICs. 

D. Tests to compare CPCs: Pepe and Mori (1993)— 

2 curves; Lunn (1998)—g curves. 
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VIII. Method 2—Lunn-McNeil (LM) Approach 

(pages 421-427) 

A. Allows only one Cox PH model to be fit rather than 
fitting separate modes for each event type 
(Method 1). 

B. LM uses an augmented data layout. 

1. For ith subject at time h, layout has C rows of 
data, where C = # event-types. 

2. Dummy variables Di, D 2 , ..., Dc are created to 
distinguish the C event-types. 

3. The Status variable, e c , c = 1,..., C, equals 1 
for event-type c and 0 otherwise. 

4. Predictors are denoted by Xi,..., X p . 

5. Example of data layout for Byar dataset. 

C. General form of LM model (interaction SC model). 

hg(t,X) = h Pg (t) exp[(3jXi + | 3 2 X 2 + • • • + (3 p X p 
g= 1,2, ...,c _|_ 5 21 d 2 Xi + 622 D 2 X 2 + • • • + 62 P D 2 X P 

+ 63103X1 + S32D3X2 + ■ ■ ■ + 63 p D 3 X p 

+ ••• 

+ 6 ciDcXi + 6 c 2 DcX 2 + - • - + 6 c p DcX p ] 

1. LM model for event-type g = 1: 

a. hj(t,X) = h*,(t) 

x exp[|3iXi + [3 2 X 2 + ■ • • + |3 p X p ] 

b. D 2 = D 3 — ■ ■ ■ — Dd = 0 

c. HR g=1 (Xi = 1 vs. Xi = 0) = expfPJ 

2. LM model for event-type g (> 1): 

a. h*(t,X) = h* g (t) 

x exp[((3 1 + 6 g i)Xi + ((3 2 + 6 g 2 )X 2 
+ • ■ ■ + (|3 p + 6 g p)X p ] 

b. HR g (Xi = 1 vs.Xj = 0 = exp[((3 1 + 5 g i)] 

D. LM model for Byar data. 

1. h*(t,X) = h Pg (t) 

x expfpjRx + |3 2 Age 4 -b |3 g SG 

+ 6 2 iD 2 Rx + 622 D 2 Age + • • • + 628 D 2 SG 
+ 631D3RX + 632D3Age + • • • + 638D3SG] 

g = 1, 2, 3 

2. D 2 = CVD and D 3 = OTH are (0,1) dummy 
variables for 3 event-types. 
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3. HR Ca (Rx = 1 vs. Rx = 0) = expfpj 
HRcvd(Rx = 1 vs. Rx = 0) = expfpj + 621 ] 
HRoth(Rx = 1 vs. Rx = 0) = expfpj + 631] 

4. Mathematical formulas for Wald tests and 
confidence intervals require standard errors for 
sums ((3j + Sgi) for g = 2 and 3; requires 
special syntax for computer. 

5. Output provided and compared to output from 
Method 1. 

IX. Method 2a—Alternative Lunn-McNeil (LM a it) 
Approach (pages 427-430) 

A. An alternative LM formulation that gives identical 
output to that obtained from Method 1 (separate 
models approach). 

B. Same data layout as for LM model, but only 
product terms in LM a i t model. 

C. General form of LM a i t (interaction SC) model: 

g = i,...,c 

h' g (t,X) = hog'(t) 

x expfddiDiXi + 6' 12 DiX 2 + • • • + SdpDjXp 
+ 6 , 2 iD 2 Xi + 6 , 22 D 2 X 2 + • • • + 5 , 2 p D 2 Xp 
4 - 

+ 5 'ciDcXi + 6^00X2 + • • • + S'cpDcXp] 

D. Hazard ratio formula involves only coefficients of 
product terms for all g: 

HR g (Xi = 1 vs.Xj = 0) = exp[S' gl ], g = 1,2, 3 

a. Statistical inference information directly provided 
by LM^t output. 

E. Example of output for LM a i t model using Byar 
data. 

X. Method 1—Separate Models versus Method 2—LM 
Approach (pages 431-434) 

A. LM and LM a i t models allow flexibility to perform 
statistical inferences about features of competing 
risks model not conveniently available using 
separate models (Method 1) approach. 

B. LM and LM a i t models can assess whether a 
no-interaction SC model is more appropriate than 
the initial interaction SC LM model. 
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C. Example of comparison of no-interaction with 
interaction SC model using Byar data. 

D. LM and LM a j t models can assess whether an 
unstratified LM model (called LMu) is more 
appropriate than a stratified LM model. 

E. Example of LMu model involving Byar data. 

XI. Summary (pages 434-439) 

A. Competing risks survival data can be analyzed 
using Cox PH models and standard computer 
packages. 

B. There are two alternative methods that use a Cox 
PH model formulation. 

1. Fit separate models for each cause-specific 
event type, treating the remaining event types 
as censored. 

2. Use the Lunn-McNeil (LM) approach to fit a 
single model that incorporates the analysis for 
each cause-specific event. 

C. Each of the above approaches requires that 
competing risks be independent (noninformative 
censoring). 

D. Without the independence assumption, methods 
for competing risks analysis are unavailable. 

E. The Cumulative Incidence Curve (CIC) or the 
Conditional Probability Curve (CPC) are 

alternatives to the KM curve, when use of a KM 
curve has questionable interpretation. 


Practice 

Exercises 


Answer questions 1 to 15 as true or false (circle T or F). 

T F 1. A competing risk is an event-type (i.e., failure sta¬ 
tus) that can occur simultaneously with another 
event of interest on the same subject. 

T F 2. An example of competing risks survival data is a 
study in which patients receiving radiotherapy for 
head and neck cancer may either die from their 
cancer or from some other cause of death. 

T F 3. If all competing risks in a given study are different 
causes of death, then it is possible to have both 
competing risks and recurrent events in the same 
study. 
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T F 4. 

T F 5. 

T F 6. 

T F 7. 

T F 8. 

T F 9. 

T F 10. 

T F 11. 

T F 12. 
T F 13. 

T F 14. 


Suppose patients with advanced-stage cancer 
may die after surgery before their hospital stay is 
long enough to get a hospital infection. Then such 
deaths from surgery reduce the hospital’s burden 
of infection control. 

The typical approach for analyzing competing 
risks using a Cox PF1 model involves fitting sepa¬ 
rate models for each competing risk ignoring the 
other competing risks. 

Suppose that a cause-specific risk of interest is 
development of lung metastasis and a competing 
risk is local recurrence of a lung tumor. Then a 
patient who develops a local recurrence is treated 
as a failure in a competing risk analysis. 

When there are no competing risks, then any 
study subject in the risk set at a given time has 
the same risk for failing as any other subject in 
the risk set with the same values for covariate pre¬ 
dictors at time t. 

If, when analyzing competing risks survival data, 
it is assumed that censoring is noninformative 
(i.e., independent), then a subject in the risk set at 
time t is as likely to fail from any competing risk 
as to be lost to follow-up. 

When a sensitivity analysis indicates that a worst- 
case scenario gives meaningfully different results 
from an analysis that assumes independence of 
competing risks, then there is evidence that the 
independence assumption is violated. 

The typical competing risk analysis assumes that 
competing risks are independent even if this as¬ 
sumption is not true. 

The Cumulative Incidence Curve (CIC) provides 
risk estimates for the occurrence of a cause- 
specific event in the presence of competing risks. 
CIC = 1 — KM, where KM denotes the Kaplan- 
Meier curve. 

A CIC for a cause-specific event that ignores the 
control of covariates does not require the assump¬ 
tion of independent competing risks. 

A Cumulative Probability Curve (CPC) gives the 
probability of experiencing an event c by time t, 
from risk c, given that an individual has experi¬ 
enced any of the other competing risks by time t. 
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T F 15. If CICc = .4, then CPC = .4/.6 = .667. 

T F 16. The Lunn-McNeil (LM) approach fits a single 
stratified Cox model using an augmented dataset 
to obtain the same results as obtained by fitting 
separate Cox models for each cause-specific com¬ 
peting risk. 

T F 17. An advantage of the Lunn-McNeil (LM) approach 
over the approach that fits separate Cox models is 
that the LM approach allows for testing whether 
a no-interaction SC model might be preferable to 
an interaction SC model. 

T F 18. Given the LM model stratified on two cause- 
specific events, Cancer and CVD: 

hg(t,X) = ho g (t)exp[|3 1 Rx+ |3 2 Age 

+ Si(D x Rx) + 6 2 (D x Age)], 
g = 1,2 where 
D = 0 if Ca and = 1 if CVD 

then 

HRcvd(Rx = 1 vs. Rx = 0) = exp[(3[ + 6i] 

T F 19. Given the LM a i t model for two cause-specific 
events, Cancer and CVD: 

h'g(l,X) = h'og(t) x exp[6'iiDiRx + 5 'i 2 D, Age 
+ 6 , 2 iD 2 Rx + 6 ' 22 D 2 Age], 
g = 1, 2 where 

Dj = 1 if Ca or 0 if CVD, and 
D 2 = 0 if Ca or 1 if CVD, 

then 

HRcvd(Rx = 1 vs. Rx = 0) = exp[S' 2 i] 

T F 20. The LMu model that would result if the LM model 
of Question 18 were changed to an unstratified 
Cox PH model can be written as follows. 

h*(t,X) = hpCt) exp[|3]Rx + |3*Age + 6 21 (D x Rx) 
+ 6 22 (D x Age)] 
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Consider a hypothetical study of the effect of a bone mar¬ 
row transplant for leukemia on leukemia-free survival, where 
transplant failures can be of one of two types: relapse of 
leukemia and nonrelapse death (without prior relapse of 
leukemia). Suppose that in hospital A, 100 patients undergo 
such a transplant and that within the first 4 years post¬ 
transplant, 60 die without relapse by year 2 and 20 relapse 
during year 4. Suppose that in hospital B, 100 patients un¬ 
dergo such a transplant but post-transplant, there are 20 non¬ 
relapse deaths by year 1,15 relapses during year 2, 40 non¬ 
relapse deaths between years 3 and 4, and 5 relapses during 
year 4. 

21. What are the competing risks in this study? 

22. What is the proportion of initial patients in hospitals 
A and B, respectively, that have leukemia relapse by 4 
years? 

The following tables provide the Kaplan-Meier curves for 
relapse of leukemia for each study. 


Hospital A Hospital B 


tj 

n i 

m i 

Oj 

S(tj) 

tj 

n i 

m j 

qj 

S(tj) 

0 

100 

0 

60 

1 

0 

100 

0 

20 

1 

2 

40 

0 

0 

1 

1 

80 

0 

0 

1 

4 

40 

20 

20 

.5 

2 

80 

15 

0 

0.8125 






3 

65 

0 

40 

0.8125 






4 

25 

5 

20 

0.65 


23. How have both tables treated the competing risk for 
nonrelapse death in the calculation of the KM probabili¬ 
ties? 

24. Why are the KM probabilities different at 4 years for each 
hospital? 
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25. Compute the CIC curves for each hospital using the 
following tables. 


Hospital A 


tj 

n i 

m j 

hca(tj) 

S(tj-i) 

Ica(tj) 

CIC(tj) 

0 

100 

0 

0 




2 

40 

0 

0 

1 

0 

0 

4 

40 

20 

- 

- 

- 

- 

Hospital B 

tj 

n i 

m i 

hca(tj) 

S(tj!) 

Ica(tj) 

ClC(tj) 

0 

100 

0 

0 




1 

80 

0 

0 

1 

0 

0 

2 

80 

15 





3 

65 

0 





4 

25 

5 

- 

- 

- 

- 


26. Why are the CIC probabilities the same at 4 years? 
Consider a hypothetical study to assess the effect of a new 
hospital infection control strategy for patients who undergo 
heart transplant surgery in a given hospital. The exposure 
variable of interest is a binary variable Group (G): G = 0 for 
those patients receiving heart transplants from 1992 through 

1995 when the previous hospital control strategy was used; 
G = 1 for those patients receiving heart transplants from 

1996 through 1999 when the new hospital infection control 
strategy was adopted. The primary event of interest is getting 
a hospital infection after surgery. A competing risk is death 
during recovery from surgery without getting a hospital infec¬ 
tion. Control variables being considered are tissue mismatch 
score (TMS) at transplant and AGE at transplant. The out¬ 
come variable of interest is time (DAYS after surgery) until a 
patient developed a hospital infection. 

27. State a cause-specific no-interaction Cox PH model for 
assessing the effect of group status (G) on time until a 
hospital infection event. 

28. When fitting the model given in Question 27, which pa¬ 
tients should be considered censored? 

29. Describe or provide a table that would show how the data 
on the ith patient should be augmented for input into a 
Lunn-McNeil (LM) model for this analysis. 
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Test 


30. State a LM model that can be used with an augmented 
dataset that will provide identical results to those ob¬ 
tained from using the model of Question 27. 

3 i. For the LM model of Question 30, what is the formula 
for the hazard ratio for the group effect G, controlling for 
TMS and AGE. 

32. Describe how you would test whether a no-interaction SC 
LM model would be more appropriate than an interaction 
SC LM model. 

33. State a LM a i t model that can be used with an augmented 
dataset that will provide identical results to those ob¬ 
tained from using the model of Question 27. 

34. For the LM a i t model of Question 33, what is the formula 
for the hazard ratio for the group effect G, controlling for 
TMS and AGE? 


The dataset shown below describes a hypothetical study of 
recurrent bladder cancer. The entire dataset contained 53 
patients, each with local bladder cancer tumors who are 
followed for up to 30 months after transurethral surgical 
excision. Three competing risks being considered are local 
recurrence of bladder cancer tumor (event = 1), bladder 
metastasis (event = 2), or other metastasis (event = 3). The 
variable time denotes survival time up to the occurrence of 
one of the three events or censorship from loss to follow¬ 
up, withdrawal, or end of study. The exposure variable of 
interest is drug treatment status (tx, 0 = placebo, 1 = treat¬ 
ment A), The covariates listed here are initial number of tu¬ 
mors (num) and initial size of tumors (size) in centimeters. 


id 

event 

time 

tx 

num 

size 

1 

1 

8 

1 

1 

1 

2 

0 

1 

0 

1 

3 

3 

0 

4 

1 

2 

1 

4 

0 

7 

0 

1 

1 

5 

0 

10 

1 

5 

1 

6 

2 

6 

0 

4 

1 

7 

0 

10 

1 

4 

1 

8 

0 

14 

0 

1 

1 

9 

0 

18 

1 

1 

1 

10 

3 

5 

0 

1 

3 

11 

0 

18 

1 

1 

3 

12 

1 

12 

0 

1 

1 

13 

2 

16 

1 

1 

1 

14 

0 

18 

0 

1 

1 

15 

0 

23 

1 

3 

3 


(' Continued ) 




(i Continued ) 

id event time tx num size 
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16 

3 

10 

0 

1 

3 

17 

1 

15 

1 

1 

3 

18 

0 

23 

0 

1 

3 

19 

2 

3 

1 

1 

1 

20 

3 

16 

0 

1 

1 

21 

1 

23 

1 

1 

1 

22 

1 

3 

0 

3 

1 

23 

2 

9 

1 

3 

1 

24 

2 

21 

0 

3 

1 

25 

0 

23 

1 

3 

1 

26 

3 

7 

0 

2 

3 

27 

3 

10 

1 

2 

3 

28 

1 

16 

0 

2 

3 

29 

1 

24 

1 

2 

3 

30 

1 

3 

0 

1 

1 

31 

2 

15 

1 

1 

1 

32 

2 

25 

0 

1 

1 

33 

0 

26 

1 

1 

2 

34 

1 

1 

0 

8 

1 

35 

0 

26 

1 

8 

1 

36 

1 

2 

0 

1 

4 

37 

1 

26 

1 

1 

4 

38 

1 

25 

0 

1 

2 

39 

0 

28 

1 

1 

2 

40 

0 

29 

0 

1 

4 

41 

0 

29 

1 

1 

2 

42 

0 

29 

0 

4 

1 

43 

3 

28 

1 

1 

6 

44 

1 

30 

0 

1 

6 

45 

2 

2 

1 

1 

5 

46 

1 

17 

0 

1 

5 

47 

1 

22 

1 

1 

5 

48 

0 

30 

0 

1 

5 

49 

3 

3 

1 

2 

1 

50 

2 

6 

0 

2 

1 

51 

3 

8 

1 

2 

1 

52 

3 

12 

0 

2 

1 

53 

0 

30 

1 

2 

1 


1. Suppose you wish to use these data to determine the ef¬ 
fect of tx on survival time for the cause-specific event of a 
local recurrence of bladder cancer. State a no-interaction 
Cox PH model for assessing this relationship that adjusts 
for the covariates num and size. 
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2. When fitting the model given in Question 1, which sub¬ 
jects are considered censored? 

3. How would you modify your answers to Questions 1 and 
2 if you were interested in the effect of tx on survival time 
for the cause-specific event of finding metastatic bladder 
cancer? 

4. For the model considered in Question 1, briefly describe 
how to carry out a sensitivity analysis to determine how 
badly the results from fitting this model might be biased 
if the assumption of independent competing risks is vi¬ 
olated. 

5. The following two tables provide information necessary 
for calculating CIC curves for local recurrence of bladder 
cancer (event = 1) separately for each treatment group. 
The CIC formula used for both tables is given by the 
expression 

CICiCtj) = ^li(tj) = 

j'=l j'=l 

where hj(tj) = mij/rij, my is the number of local recur¬ 
rent failures at time t,, and S(tj_i) is the overall (event- 
free) survival probability for failure from either of the 
two competing risks at time tj_i. 


tx = 1 (Treatment A) 


tj 

n i 

d U 

hi(tj) 

S(tj-i) 

ii(tj) 

CKhttj) 

0 

27 

0 

0 

— 

— 

— 

2 

27 

0 

0 

1 

0 

0 

3 

26 

0 

0 

.9630 

0 

0 

4 

24 

0 

0 

.8889 

0 

0 

8 

23 

1 

.0435 

.8889 

.0387 

.0387 

9 

21 

0 

0 

.8116 

0 

.0387 

10 

20 

0 

0 

.7729 

0 

.0387 

15 

17 

1 

.0588 

.7343 

.0432 

.0819 

16 

15 

0 

0 

.6479 

0 

.0819 

18 

14 

0 

0 

.6047 

0 

.0819 

22 

12 

1 

.0833 

.6047 

.0504 

.1323 

23 

11 

1 

.0910 

.5543 

.0504 

.1827 

24 

8 

1 

.1250 

.5039 

.0630 

.2457 

26 

7 

1 

.1429 

.4409 

.0630 

.3087 

28 

4 

0 

0 

.3779 

0 

.3087 

29 

2 

0 

0 

.2835 

0 

.3087 

30 

1 

0 

0 

.2835 

0 

.3087 
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tx = 1 (Placebo) 


tj 

n j 

d U 

hi (tj) 

S(tj-t) 

Ii(tj) 

ClCi(tj) 

0 

26 

0 

0 

— 

— 

— 

1 

26 

1 

.0400 

1 

.0400 

.0400 

2 

24 

1 

.0417 

.9615 

.0400 

.0800 

3 

23 

2 

.0870 

.9215 

.0801 

.1601 

5 

21 

0 

0 

.8413 

0 

.1601 

6 

20 

0 

0 

.8013 

0 

.1601 

7 

18 

0 

0 

.7212 

0 

.1601 

10 

16 

0 

0 

.6811 

0 

.1601 

12 

15 

1 

.0667 

.6385 

.0426 

.2027 

14 

13 

0 

0 

.6835 

0 

.2027 

16 

12 

1 

.0833 

.5534 

.0461 

.2488 

17 

10 

1 

.1000 

.4612 

.0461 

.2949 

18 

9 

0 

0 

.4150 

0 

.2949 

21 

8 

0 

0 

.4150 

0 

.2949 

23 

7 

0 

0 

.3632 

0 

.2949 

25 

6 

1 

.1667 

.3632 

.0605 

.3554 

29 

4 

0 

0 

.2421 

0 

.3554 

30 

2 

1 

0 

.2421 

0 

.3554 


a. Verify the CICi calculation provided at failure time 
tj = 8 for persons in the treatment group (tx = 1); that 
is, use the original data to compute hi(tj), S(tj_i), 
Ii(tj), and CICi(tj), assuming that the calculations 
made up to this failure time are correct. 

b. Verify the CICi calculation provided at failure time 
tj = 25 for persons in the placebo group (tx = 0). 

c. Interpret the CICi values obtained for both the treat¬ 
ment and placebo groups at tj = 30. 

d. How can you calculate the CPCi values for both treat¬ 
ment and placebo groups at tj = 30? 
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6 . The following output was obtained using separate 
models for each of the 3 event-types. 


Event = 1 


Var 


DF 

Coef 

Std.Err. 

P > |z| 

Haz.ratio 

tx 


1 

-0.6258 

0.5445 

0.2504 

0.535 

num 


1 

0.0243 

0.1900 

0.8983 

1.025 

size 


1 

0.0184 

0.1668 

0.9120 

1.125 

Event 

= 2 






Var 


DF 

Coef 

Std.Err. 

P > N 

Haz.ratio 

tx 


1 

-0.0127 

0.6761 

0.9851 

0.987 

num 


1 

-0.1095 

0.2281 

0.6312 

0.896 

size 


1 

-0.6475 

0.3898 

0.0966 

0.523 

Event 

= 3 






Var 


DF 

Coef 

Std.Err. 

P > N 

Haz.ratio 

tx 


1 

-0.3796 

0.6770 

0.5750 

0.684 

num 


1 

-0.1052 

0.3135 

0.7372 

0.900 

size 


1 

-0.0238 

0.2177 

0.9128 

0.976 


a. What is the effect of treatment on survival from hav¬ 
ing a local recurrence of bladder cancer, and is it 
significant? 

b. What is the effect of treatment on survival from de¬ 
veloping metastatic bladder cancer, and is it signifi¬ 
cant? 

c. What is the effect of treatment on survival from other 
metastatic cancer, and is it significant? 
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7. Below is the output from fitting a LM model to the 
bladder cancer data. 


Var DF Coef Std.Err. p > |z| Haz.ratio 


txd2 

1 

0.6132 

0.8681 

0.4800 

1.846 

txd3 

1 

0.2463 

0.8688 

0.7768 

1.279 

numd2 

1 

-0.1337 

0.2968 

0.6523 

0.875 

numd2 

1 

-0.1295 

0.3666 

0.7240 

0.879 

sized2 

1 

-0.6660 

0.4239 

0.1162 

0.514 

sized3 

1 

-0.0423 

0.2742 

0.8775 

0.959 

tx 

1 

-0.6258 

0.5445 

0.2504 

0.535 

num 

1 

0.0243 

0.1900 

0.8983 

1.025 

size 

1 

0.0184 

0.1668 

0.9120 

1.125 


a. State the hazard model formula for the LM model 
used for the above output. 

b. Determine the estimated hazard ratios for the effect 
of each of the 3 cause-specific events based on the 
above output. 

c. Verify that the estimated hazard ratios computed in 
Part b are identical to the hazard ratios computed in 
Question 6. 

8 . Below is the output from fitting a LM a i t model to the 
bladder cancer data. 


Var DF Coef Std.Err. p > [z| Haz.ratio 


txdl 

1 

-0.6258 

0.5445 

0.2504 

0.535 

txd2 

1 

-0.0127 

0.6761 

0.9851 

0.987 

txd3 

1 

-0.3796 

0.6770 

0.5750 

0.684 

numdl 

1 

0.0243 

0.1900 

0.8983 

1.025 

numd2 

1 

-0.1095 

0.2281 

0.6312 

0.896 

numd3 

1 

-0.1052 

0.3135 

0.7372 

0.900 

sized 1 

1 

0.0184 

0.1668 

0.9120 

1.125 

sized2 

1 

-0.6475 

0.3898 

0.0966 

0.523 

sized3 

1 

-0.0238 

0.2177 

0.9128 

0.976 


a. State the hazard model formula for the LM a i t model 
used for the above output. 

b. Determine the estimated hazard ratios for the effect 
of each of the 3 cause-specific events based on the 
above output. 

c. Verify that the estimated hazard ratios computed in 
Part b are identical to the hazard ratios computed in 
Questions 6 and 7. 
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Answers to 

Practice 

Exercises 


9. State the formula for a no-interaction SC LM model for 
these data. 

10. Describe how you would test whether a no-interaction 
SC LM model would be more appropriate than an inter¬ 
action SC LM model. 


1. F: Only one competing risk event can occur at a given time. 

2. T 

3. F: You can die only once. 

4. T 

5. F: Competing risks must be treated as censored observa¬ 
tions, rather than ignored. 

6 . F: A patient who develops a local recurrence will be treated 
as censored. 

7. F: The statement would be true providing censoring is non- 
informative. 

8 . T 

9. F: A sensitivity analysis can never provide explicit evidence 
about whether the independence assumption is satisfied; 
it can only suggest how biased the results might be if the 
assumption is not satisified. 

10. T 

11. T 

12. F: The formula is correct only if there is one risk. See 
Section V in the text for the general formula. 

13. T 

14. F: The correct statement should be: CPC gives the prob¬ 
ability of experiencing an event c by time t, from risk c, 
given that an individual has not experienced any of the 
other competing risks by time t. 
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15. F: the correct formula for CPC is: CPC c = CIC c /(l — CIC c 0 
where CIC c = .4 and CIC c ' = CIC from risks other than c, 
where the latter is not necessarily equal to .4. 

16. T 

17. T 

18. T 

19. T. 

20. F: The correct LMu model is 

h*(t,X) = h*(L)exp[ Yl D + |3*Rx + |3*Age + S^fD x Rx) 

+ 6* 2 (D x Age)] 

21. The competing risks are (1) relapse of leukemia and (2) 
nonrelapse death. 

22 . 20/100 = 0 . 2 . 

23. Both tables have treated the competing risks as if they were 
censored observations. 

24. The KM probabilities are different for the two hospitals 
because the competing risks contribute a different pattern 
of censorship in the two hospitals. 

25. The CIC curves for each hospital are calculated as follows. 


Hospital A 


tj 

n i 

m i 

hca(tj) 

S(tj-i) 

Ica(tj) 

ClC(tj) 

0 

100 

0 

0 

— 

— 

— 

2 

40 

0 

0 

1 

0 

0 

4 

40 

20 

0.5 

0.4 

0.20 

0.20 

Hospital B 

tj 

n i 

m i 

hca(tj) 

S(tj—0 

Ica(tj) 

ClC(tj) 

0 

100 

0 

0 

— 

— 

— 

1 

80 

0 

0 

1 

0 

0 

2 

80 

15 

0.1875 

0.8 

0.15 

0.15 

3 

65 

0 

0 

0.65 

0 

0.15 

4 

25 

5 

0.20 

0.25 

0.05 

0.20 
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26. The CIC probabilities are the same at 4 years because 
they give marginal probabilities that are not influenced 
by the pattern of censorship of the competing risks that 
are treated as censored. In hospital B, for example, the 
marginal probability of 0.15 at year 2 is given by the pro¬ 
portion of the initial risk set of 100 subjects that had a 
relapse of leukemia by year 2, regardless of the number 
of nonrelapse deaths prior to year 2. Similarly for hospi¬ 
tal B, the marginal probability of .20 at year 4 adds to the 
marginal probability at year 2 the proportion of the initial 
risk set of 100 subjects that had a relapse between year 2 
and year 4, regardless of the number of nonrelapse deaths 
that occurred between year 2 and year 4. 

27- hni(t,X) = ho(t) exp[pi H iG + |3 2 hiTMS -I- |3 3 HI AGE] 
where HI denotes a hospital infection event 

28. Patients who die after surgery without developing a hospi¬ 
tal infection are censored. Also censored are any patients 
who are either lost to follow-up or withdraw from the study, 
although such patients are unlikely. 

29. Augmented Data for LM Approach 


Subj 

Stime 

Status 

D, 

D 2 

G 

TMS 

AGE 

i 

tr 

eii 

1 

0 

Gi 

TMSi 

AGE; 

i 

tr 

£2i 

0 

0 

Gi 

TMSi 

AGE; 


where ey = 1 if the ith subject develops a 
hospital infection, 0 otherwise 

e 2 j = 1 if ith subject dies after surgery, 

0 otherwise 

Di = indicator for hospital infection event 
D 2 = indicator for death after surgery 
event 

30. h*(t,X) = h^tjexpfPjG + (3 2 TMS + |3 3 AGE 

g=u + 5 2 iD 2 G 4- 6 22 D 2 TMS + 6 23 D 2 AGE] 

31. HRhi(RX = 1 vs. RX = 0) = expfPJ 
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32. Carry out a likelihood ratio test to compare the following 
two models. Full (SC Interaction LM) model: 
h*(t,X) = hS g (t)exp[(3 1 G+ (3 2 TMS+ (3 3 AGE 

g=1 - 2 + 621 D 2 G + 622 D 2 TMS + 623 D 2 AGE] 


Reduced (no-interaction SC LM) model: 
h*(t,X) = hS g (t)exp[(3 1 G+ (3 2 TMS+ |3 3 AGE] 

g=U 

LR test statistic= —2 lnL R — (—2 lnL F ) is distributed x 3 
under Ho: no-interaction model 

33. h g (t,X) = h'ogltlexptP'jDiG + (3' 2 DiTMS + |3' 3 DiAGE 

g= L2 + 5' 21 D 2 G + 6 22 D 2 TMS + 6 23 D 2 AGE] 

34. HR ffl (RX = 1 vs. RX = 0) = exptp'J 


Computer 

Appendix: 

Survival 

Analysis 

on the 

Computer 


In this appendix, we provide examples of computer programs 
for carrying out the survival analyses described in this text. 
This appendix does not give an exhaustive survey of all com¬ 
puter packages currently available, but rather is intended to 
describe the similarities and differences among three of the 
most widely used packages. The software packages that we 
describe are Stata (version 7.0), SAS (version 8.2), and SPSS 
(version 11.5). A complete description of these packages is be¬ 
yond the scope of this appendix. Readers are referred to the 
built-in Help functions for each program for further informa¬ 
tion. 
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Datasets 

Most of the computer syntax and output presented in this ap¬ 
pendix are obtained from running step-by-step survival anal¬ 
yses on the “addicts” dataset. The other dataset that is utilized 
in this appendix is the “bladder cancer” dataset for analyses 
of recurrent events. The addicts and bladder cancer data are 
described below and can be downloaded from our Web site 
at http://www.sph.emory.edu/~dkleinb/surv2.htm. On this 
Web site we also provide many of the other datasets that have 
been used in the examples and exercises throughout this text. 
The data on our Web site are provided in several forms: (1) as 
Stata datasets (with a .dta extension), (2) as SAS version 8.2 
datasets (with a .sas7bdat extension), (3) as SPSS datasets 
(with a .sav extension), and (4) as text datasets (with a .dat 
extension). 

Addicts Dataset (addicts.dat) 

In a 1991 Australian study by Caplehorn et ah, two methadone 
treatment clinics for heroin addicts were compared to assess 
patient time remaining under methadone treatment. A pa¬ 
tient’s survival time was determined as the time, in days, until 
the person dropped out of the clinic or was censored. The two 
clinics differed according to their live-in policies for patients. 
The variables are defined as follows. 

ID—Patient ID. 

SURVT—The time (in days) until the patient dropped out of the 
clinic or was censored. 

STATUS—Indicates whether the patient dropped out of the clinic 
(coded 1) or was censored (coded 0). 

CLINIC—Indicates which methadone treatment clinic the pa¬ 
tient attended (coded 1 or 2). 

PRISON—Indicates whether the patient had a prison record 
(coded 1) or not (coded 0). 

DOSE—A continuous variable for the patient’s maximum 
methadone dose (mg/day). 

Bladder Cancer Dataset (bladder.dat) 

The bladder cancer dataset contains recurrent event outcome 
information for 86 cancer patients followed for the recurrence 
of bladder cancer tumor after transurethral surgical excision 
(Byar and Green, 1980). The exposure of interest is the effect 
of the drug treatment of thiotepa. Control variables are the 
initial number and initial size of tumors. The data layout is 
suitable for a counting processes approach. The variables are 
defined as follows. 
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ID—Patient ID (may have multiple observations for the same 
subject). 

EVENT—Indicates whether the patient had a tumor (coded 1) or 
not (coded 0). 

INTERVAL—A counting number representing the order of the 
time interval for a given subject (coded 1 for the subjects first 
time interval, 2 for a subject’s second time interval, etc.). 

START—The starting time (in months) for each interval. 

STOP—The time of event (in months) or censorship for each 
interval. 

TX—Treatment status (coded 1 for treatment with thiotepa and 
0 for the placebo). 

NUM—The initial number of tumor(s). 

SIZE—The initial size (in centimeters) of the tumor. 


Software 

What follows is a detailed explanation of the code and output 
necessary to perform the type of survival analyses described in 
this text. The rest of this appendix is divided into three broad 
sections, one for each of the following software packages, 

A. Stata, 

B. SAS, and 

C. SPSS. 

Each of these sections is self-contained, allowing the reader 
to focus on the particular statistical package of his or her 
interest. 


A. Stata 

Analyses using Stata are obtained by typing the appropriate 
statistical commands in the Stata Command window or in the 
Stata Do-file Editor window. The key commands used to per¬ 
form the survival analyses are listed below. These commands 
are case sensitive and lowercase letters should be used. 

stset—Declares data in memory to be survival data. Used to 
define the “time-to-event” variable, the “status” variable, and 
other relevant survival variables. Other Stata commands be¬ 
ginning with st utilize these defined variables, 
sts list—Produces Kaplan-Meier (KM) or Cox adjusted survival 
estimates in the output window. The default is KM survival 
estimates. 

sts graph—Produces plots of Kaplan-Meier (KM) survival es¬ 
timates. This command can also be used to produce Cox ad¬ 
justed survival plots. 
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sts generate—Creates a variable in the working dataset that con¬ 
tains Kaplan-Meier or Cox adjusted survival estimates, 
sts test—Used to perform statistical tests for the equality of sur¬ 
vival functions across strata. 

stphplot—Produces plots of log-log survival against the log of 
time for the assessment of the proportional hazards (PH) as¬ 
sumption. The user can request KM log-log survival plots or 
Cox adjusted log-log survival plots, 
stcoxkm—produces KM survival plots and Cox adjusted survival 
plots on the same graph. 

stcox—Used to run a Cox proportional hazard model, a stratified 
Cox model, or an extended Cox model (i.e., containing time- 
varying covariates). 

stphtest—Performs statistical tests on the PH assumption based 
on Schoenfeld residuals. Use of this command requires that a 
Cox model be previously run with the command stcox and the 
schoenfeld( )option. 

streg—Used to run parametric survival models. 

Four windows will appear when Stata is opened. These win¬ 
dows are labeled Stata Command, Stata Results, Review, and 
Variables. The user can click on File -»■ Open to select a work¬ 
ing dataset for analysis. Once a dataset is selected, the names 
of its variables appear in the Variables window. Commands 
are entered in the Stata Command window. The output gen¬ 
erated by commands appears in the Results window after the 
return key is pressed. The Review window preserves a his¬ 
tory of all the commands executed during the Stata session. 
The commands in the Review window can be saved, copied, or 
edited as the user desires. Commands can also be run from the 
Review window by double-clicking on the command. Com¬ 
mands can also be saved in a file by clicking on the log button 
on the Stata tool bar. 

Alternatively, commands can be typed or pasted into the Do- 
file Editor. The Do-file Editor window is activated by clicking 
on Window —> Do-file Editor or by simply clicking on the 
Do-file Editor button on the Stata tool bar. Commands are 
executed from the Do-file Editor by clicking on Tools -»• Do. 
The advantage of running commands from the Do-file Editor 
is that commands need not be entered and executed one at a 
time as they do from the Stata Command window. The Do- 
file Editor serves a similar function to that of the program 
editor in SAS. In fact, by typing #delim in the Do-file Editor 
window, the semicolon becomes the delimiter for completing 
Stata statements (as in SAS) rather than the default carriage 
return. 
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The survival analyses demonstrated in Stata are as follows. 

1. Estimating survival functions (unadjusted) and compar¬ 
ing them across strata; 

2. Assessing the PH assumption using graphical approaches; 

3. Running a Cox PH model; 

4. Running a stratified Cox model; 

5. Assessing the PH assumption with a statistical test; 

6 . Obtaining Cox adjusted survival curves; 

7. Running an extended Cox model; 

8 . Running parametric models; 

9. Running frailty models; and 

10. Modeling recurrent events. 

The first step is to activate the addicts dataset by clicking on 
File -»• Open and selecting the Stata dataset, addicts.dta. 
Once this is accomplished, you will see the command use 
“addicts.dta”, clear in the review window and results win¬ 
dow. This indicates that the addicts dataset is activated in 
Stata's memory. 

To perform survival analyses, you must indicate which vari¬ 
able is the “time-to-event” variable and which variable is the 
“status” variable. Rather than program this in every survival 
analysis command, Stata provides a way to program it once 
with the stset command. All survival commands beginning 
with st utilize the survival variables defined by stset as long 
as the dataset remains in active memory. The code to define 
the survival variables for the addicts data is as follows. 

stset survt, failure(status ==1) id(id) 

Following the word stset comes the name of the “time-to- 
event” variable. Options for Stata commands follow a comma. 
The first option used is to define the variable and value that in¬ 
dicates an event (or failure) rather than a censorship. Without 
this option, Stata assumes all observations had an event (i.e., 
no censorships). Notice two equal signs are used to express 
equality. A single equal sign is used to designate assignment. 
The next option defines the id variable as the variable, ID. This 
is unnecessary with the addicts dataset because each observa¬ 
tion represents a different patient (cluster). However, if there 
were multiple observations and multiple events for a single 
subject (cluster), Stata can provide robust variance estimates 
appropriate for clustered data. 



468 Computer Appendix: Survival Analysis on the Computer 


Stata 


The stset command will add four new variables to the dataset. 
Stata interprets these variables as follows: 

_t —the “time-to-event” variable; 

_d— the “status variable” (coded 1 for an event, 0 for a censor¬ 
ship); 

_t0 —the beginning “time variable". All observations start at time 
0 by default; and 

_st —indicates which variables are used in the analysis. All obser¬ 
vations are used (coded 1) by default. 

To see the first 10 observations printed in the output window, 
enter the command: 

list in 1/10 

The command stdes provides descriptive information (output 
below) of survival time. 




stdes 

failure 

.d: 

status 

analysis time 

-t: 

survt 


id: 

id 


1 


j. -p er subject 


Category 

total 

mean 

min 

median 

max 

no. of subjects 

238 





no. of records 

238 

1 

1 

1 

1 

(first) entry time 


0 

0 

0 

0 

(final) exit time 


402.5714 

2 

367.5 

1076 

subjects with gap 

0 





time on gap if gap 

0 





time at risk 

95812 

402.5714 

2 

367.5 

1076 

failures 

150 

.6302521 

0 

1 

1 


The commands strate and stir can be used to obtain incident 
rate comparisons for different categories of specified vari¬ 
ables. The strate command lists the incident rates by CLINIC 
and the stir command gives rate ratios and rate differences. 
Type the following commands one at a time (output omitted). 

strate clinic 
stir clinic 
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For the survival analyses that follow, it is assumed that the 
command stset has been run for the addicts dataset, as 
demonstrated on the previous page. 


1. ESTIMATING SURVIVAL FUNCTIONS 
(UNADJUSTED) AND COMPARING 
THEM ACROSS STRATA 

To obtain Kaplan-Meier survival estimates use the command 
sts list. The code and output follow. 


sts list 


failure -d: status == 1 
analysis time .t: survt 
id: id 


Time 

Beg. 
Total 

Fail 

Net 

Lost 

Survivor 

Function 

Std. 
Error 

[95% Conf 

. Int.] 

2 

238 

0 

2 

1.0000 




7 

236 

1 

0 

0.9958 

0.0042 

0.9703 

0.9994 

13 

235 

1 

0 

0.9915 

0.0060 

0.9665 

0.9979 

17 

234 

1 

0 

0.9873 

0.0073 

0.9611 

0.9959 

19 

233 

1 

0 

0.9831 

0.0084 

0.9555 

0.9936 

26 

232 

1 

0 

0.9788 

0.0094 

0.9499 

0.9911 

28 

231 

0 

2 

0.9788 

0.0094 

0.9499 

0.9911 

29 

229 

1 

0 

0.9745 

0.0103 

0.9442 

0.9885 

30 

228 

1 

0 

0.9703 

0.0111 

0.9386 

0.9857 

33 

227 

1 

0 

0.9660 

0.0118 

0.9331 

0.9828 


905 

8 

0 

1 

0.1362 

0.0364 

0.0748 

0.2159 

932 

7 

0 

2 

0.1362 

0.0364 

0.0748 

0.2159 

944 

5 

0 

1 

0.1362 

0.0364 

0.0748 

0.2159 

969 

4 

0 

1 

0.1362 

0.0364 

0.0748 

0.2159 

1021 

3 

0 

1 

0.1362 

0.0364 

0.0748 

0.2159 

1052 

2 

0 

1 

0.1362 

0.0364 

0.0748 

0.2159 

1076 

1 

0 

1 

0.1362 

0.0364 

0.0748 

0.2159 


If we wish to stratify by CLINIC and compare the survival 
estimates side to side for specified time points, we use the 
by() and compare( ) option. The code and output follow. 
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sts list, by(clinic) compare at (0 20 to 1080) 


failure 

-d: 

status 

analysis time 

-t: 

survt 


id: 

id 


clinic 


Survivor 

1 

Function 

2 

time 

0 

1.0000 

1.0000 


20 

0.9815 

0.9865 


40 

0.9502 

0.9595 


60 

0.9189 

0.9459 


80 

0.9000 

0.9320 


100 

0.8746 

0.9320 


120 

0.8681 

0.9179 


140 

0.8422 

0.9038 


160 

0.8093 

0.8753 


180 

0.7690 

0.8466 


200 

0.7420 

0.8323 


220 

0.6942 

0.8179 


840 

0.0725 

0.5745 

860 

0.0543 

0.5745 

880 

0.0543 

0.5171 

900 

0.0181 

0.5171 

920 


0.5171 

940 


0.5171 

960 


0.5171 

980 


0.5171 

1000 


0.5171 

1020 


0.5171 

1040 


0.5171 

1060 


0.5171 

1080 




Notice that the survival rate for CLINIC = 2 is higher than 
CLINIC = 1. Other survival times could have been requested 
using the compare() option. 

To graph the Kaplan-Meier survival function (against time), 
use the code 


sts graph 
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The code and output that provide a graph of the Kaplan-Meier 
survival function stratified by CLINIC follow. 

sts graph, by(clinic) 


Kaplan-Meier survival estimates, by clinic 



The failure option graphs the failure function (the cumulative 
risk) rather than the survival (zero to one rather than one to 
zero). The code follows (output omitted). 

sts graph, by(clinic) failure 

The code to run the log rank test on the variable CLINIC (and 
output) follows. 

sts test clinic 

failure .d: status == 1 
analysis time .t: survt 
id: id 

Log-rank test for equality of survivor functions 

Events Events 
clinic observed expected 

1 122 90.91 

2 28 59.09 

Total 150 150.00 

chi2(l) = 27.89 

Pr>chi2 = 0.0000 
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The Wilcoxon, Tarone-Ware, Peto, and Flemington- 
Harrington tests can also be requested. These tests are 
variations of the log rank test but weight each observation 
differently. The Wilcoxon test weights the jth failure time by 
nj (the number still at risk). The Tarone-Ware test weights 
the jth failure time by Jnj. The Peto test weights the jth 
failure time by the survival estimate s(f ; ) calculated over 
all groups combined. This survival estimate s(tj) is similar 
but not exactly equal to the Kaplan-Meier survival estimate. 
The Flemington-Harrington test uses the Kaplan-Meier 
survival estimate s(t) over all groups to calculate its weights 
for the jth failure time s(f,-_i) p [l — s(tj-\)] q , so it takes two 
arguments (p and q). The code follows (output omitted). 

sts test clinic, wilcoxon 

sts test clinic, tware 

sts test clinic, peto 

sts test clinic, fh(l,3) 

Notice that the default test for the sts test command is the log 
rank test. The choice of which weighting of the test statistic to 
use (e.g., log rank or Wilcoxon) depends on which test is be¬ 
lieved to provide the greatest statistical power, which in turn 
depends on how it is believed the null hypothesis is violated. 
However, one should make an a priori decision on which sta¬ 
tistical test to use rather than fish for a desired p-value. 

A stratified log rank test for CLINIC (stratified by PRISON) 
can be run with the strata option. With the stratified approach, 
the observed minus expected number of events is summed 
over all failure times for each group within each stratum 
and then summed over all strata. The code follows (output 
omitted). 

sts test clinic, strata(prison) 

The sts generate command can be used to create a new vari¬ 
able in the working dataset containing the KM survival esti¬ 
mates. The following code defines a new variable called SKM 
(the variable name is the user’s choice) that contains KM sur¬ 
vival estimates stratified by CLINIC. 

sts generate skm=s, by(clinic) 

The ltable command produces life tables. Life tables are an 
alternative approach to Kaplan-Meier that are particularly 
useful if you do not have individual level data. The code and 
output that follow provide life table survival estimates, strat¬ 
ified by CLINIC, at the time points (in days) specified by the 
interval ) option. 
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ltable survt status,by (clinic) interval(60 150 200 280 365 730 1095) 


Interval 

Beg. 
Total 

Deaths 

Lost 

Survival 

Std. 
Error 

[95% Conf 

. Int.] 

clinic 

0 

= 1 

163 

13 

4 

0.9193 

0.0215 

0.8650 

0.9523 

60 

150 

146 

14 

6 

0.8293 

0.0300 

0.7609 

0.8796 

150 

200 

126 

13 

3 

0.7427 

0.0352 

0.6661 

0.8043 

200 

280 

110 

17 

2 

0.6268 

0.0393 

0.5446 

0.6984 

280 

365 

91 

10 

6 

0.5556 

0.0408 

0.4720 

0.6313 

365 

730 

75 

41 

15 

0.2181 

0.0367 

0.1509 

0.2934 

730 

1095 

19 

14 

5 

0.0330 

0.0200 

0.0080 

0.0902 

clinic 

0 

= 2 

75 

4 

2 

0.9459 

0.0263 

0.8624 

0.9794 

60 

150 

69 

5 

3 

0.8759 

0.0388 

0.7749 

0.9334 

150 

200 

61 

3 

0 

0.8328 

0.0441 

0.7242 

0.9015 

200 

280 

58 

5 

1 

0.7604 

0.0508 

0.6429 

0.8438 

280 

365 

52 

3 

2 

0.7157 

0.0540 

0.5943 

0.8065 

365 

730 

47 

7 

23 

0.5745 

0.0645 

0.4385 

0.6890 

730 

1095 

17 

1 

16 

0.5107 

0.0831 

0.3395 

0.6584 


2. ASSESSING THE PH ASSUMPTION USING 
GRAPHICAL APPROACHES 

Several graphical approaches for the assessment of the PH 
assumption for the variable CLINIC are demonstrated: 

1. Log-log Kaplan-Meier survival estimates (stratified by 
CLINIC) plotted against time (or against the log of time); 

2. Log-log Cox adjusted survival estimates (stratified by 
CLINIC) plotted against time; and 

3. Kaplan-Meier survival estimates and Cox adjusted survival 
estimates plotted on the same graph. 

All three approaches are somewhat subjective yet, it is hoped, 
informative. The first two approaches are based on whether 
the log-log survival curves are parallel for different levels of 
CLINIC. The third approach is to determine if the COX ad¬ 
justed survival curve (not stratified) is close to the KM curve. 
In other words, are predicted values from the PH model (from 
COX) close to the “observed” values using KM? 

The first two approaches use the stphplot command whereas 
the third approach uses the stcoxkm command. The code and 
output for the log-log Kaplan-Meier survival plots follow. 




Ln[-Ln(SurvivaI Probabilities)] 
By Categories of Coded 1 or 2 
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stphplot, by(clinic) nonegative 


«-clinic = 1 -*- dinic = 2 



In(analysis time) 


The left side of the graph seems jumpy for CLINIC = 1 but it 
only represents a few events. It also looks as if there is some 
separation between the plots at the later times (right side). 
The nonegative option in the code requests log(—log) curves 
rather than the default -log(-log) curves. The choice is ar¬ 
bitrary. Without the option the curves would go downward 
rather than upward (left to right). 

Stata (as well as SAS) plots log(survival time) rather than sur¬ 
vival time on the horizontal axis by default. As far as check¬ 
ing the parallel assumption, it does not matter if log(survival 
time) or survival time is on the horizontal axis. However, 
if the log-log survival curves look like straight lines with 
log(survival time) on the horizontal axis, then there is evi¬ 
dence that the “time-to-event” variable follows a Weibull dis¬ 
tribution. If the slope of the line equals one, then there is 
evidence that the survival time variable follows an exponen¬ 
tial distribution, a special case of the Weibull distribution. For 
these situations, a parametric survival model can be used. 

It may be visually more informative to graph the log-log sur¬ 
vival curves against survival time (rather than log survival 
time). The nolntime option can be used to put survival time 
on the horizontal axis. The code and output follow. 








Ln[-Ln(Survival Probabilities)] Ln[-Ln(Survival Probabilities)] 

By Categories of Coded 1 or 2 By Categories of Coded 1 or 2 


Stata 


Computer Appendix: Survival Analysis on the Computer 475 


stphplot, by(clinic) nonegative nolntime 


clinic = 1 —*-clinic = 2 



The graph suggests that the curves begin to diverge over time. 

The stphplot command can also be used to obtain log-log 
Cox adjusted survival estimates. The code follows. 

stphplot, strata(clinic) adjust(prison dose) nonegative nolntime 

The log-log curves are adjusted for PRISON and DOSE using 
a stratified COX model on the variable CLINIC. The mean 
values of PRISON and DOSE are used for the adjustment. 
The output follows. 


clinic = 1 —*— clinic = 2 



The Cox adjusted curves look very similar to the KM curves. 


















Observed vs. Predicted Survival Probabilities 
By Categories of Coded 1 or 2 
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The stcoxkm command is used to compare Kaplan-Meier 
survival estimates and Cox adjusted survival estimates plotted 
on the same graph. The code and output follow. 

stcoxkm, by(clinic) 


-Observed: clinic = 1 -*-Observed: clinic = 2 

■H-Predicted: clinic = 1 -Predicted: clinic = 2 



The KM and adjusted survival curves are very close together 
for CLINIC = 1 and less so for CLINIC = 2. These graphical 
approaches suggest there is some violation with the PH as¬ 
sumption. The predicted values are Cox adjusted for CLINIC, 
and therefore assume the PH assumption. Notice that the pre¬ 
dicted survival curves are not parallel by CLINIC even though 
we are adjusting for CLINIC. It is the log-log survival curves, 
rather than the survival curves, that are forced to be parallel 
by Cox adjustment. 

The same graphical analyses can be performed with PRISON 
and DOSE. However, DOSE would have to be categorized 
since it is a continuous variable. 

3. RUNNING A COX PH MODEL 

For a Cox PH model, the key assumption is that the hazard 
is proportional across different patterns of covariates. The 
first model that is demonstrated contains all three covariates: 
PRISON, DOSE, and CLINIC. In this model, we are assuming 
the same baseline hazard for all possible patterns of these co¬ 
variates. In other words, we are accepting the PH assumption 
for each covariate (perhaps incorrectly). The code and output 
follow. 
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stcox prison clinic dose, nohr 


failure -d: status == 1 




analysis time -t: survt 




id: id 




Iteration 0: log likelihood = 

-705.6619 



Iteration 1: log likelihood = 

-674.54907 



Iteration 2: log likelihood = 

-673.407 



Iteration 3: log likelihood = 

-673.40242 



Iteration 4: log likelihood = 

Refining estimates: 

-673.40242 



Iteration 0: log likelihood = 

-673.40242 



Cox regression -- Breslow method 

for ties 



No. of subjects = 238 

No. of failures = 150 

Number 

of obs 

= 238 

Time at risk = 95812 





LR chi2(3) 

= 64.52 

Log likelihood = -673.40242 

Prob > 

chi2 

= 0.0000 


-t 


-d 

Coef. 

Std. Err. 

z 

P>|z| 

[95% Conf. 

Interval] 

prison 

.3265108 

.1672211 

1.95 

0.051 

-.0012366 

.6542581 

clinic 

-1.00887 

.2148709 

-4.70 

0.000 

-1.430009 

-.5877304 

dose 

-.0353962 

.0063795 

-5.55 

0.000 

- .0478997 

-.0228926 


The output indicates that it took five iterations for the log 
likelihood to converge at —673.40242. The iteration history 
typically appears at the top of Stata model output, however, 
the iteration history will subsequently be omitted. The final ta¬ 
ble lists the regression coefficients, their standard errors, and 
a Wald test statistic (z) for each covariate, with corresponding 
p-value and 95% confidence interval. 

The nohr option in the stcox command requests the regres¬ 
sion coefficients rather than the default exponentiated coef¬ 
ficients (hazard ratios). If you want the exponentiated coeffi¬ 
cients, omit the nohr option. The code and output follow. 
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stcox prison clinic dose 

Cox regression -- Breslow method for ties 


No. of subjects = 

238 

Number of obs 

= 238 

No. of failures = 

150 



Time at risk = 

95812 

LR chi2(3) 

= 64.52 

Log likelihood = 

-673.40242 

Prob > chi2 

= 0.0000 


-t 

-d 

Haz. Ratio 

Std. Err. 

z 

P>|z| 

[95% Conf. 

Interval] 

prison 

1.386123 

.231789 

1.95 

0.051 

.9987642 

1.923715 

clinic 

.3646309 

.0783486 

-4.70 

0.000 

.2393068 

.5555868 

dose 

.965223 

.0061576 

-5.55 

0.000 

.9532294 

.9773675 


This table contains the hazard ratios, its standard errors, and 
corresponding confidence intervals. Notice that you do not 
need to supply the “time-to event” variable or the status vari¬ 
able when using the stcox command. The stcox command 
uses the information supplied from the stset command. A 
Cox model can also be run using the cox command, which 
does not rely on the stset command having previously been 
run. The code follows. 

cox survt prison clinic dose, dead(status) 

Notice that with the cox command, we have to list the variable 
SURVT. The dead() option is used to indicate that the variable 
STATUS distinguishes events from censorship. The variable 
used with the dead() option needs to be coded nonzero for 
events and zero for censorships. The output from the cox 
command follows. 

Cox regression 

Entry time 0 


Log likelihood 


Breslow method for ties 

Number of obs = 238 

LR chi2(3) = 64.52 

Prob > chi2 = 0.0000 

-673.40242 Pseudo R2 = 0.0457 


survt 

status 

Coef. 

Std. Err. 

z 

P>|z| 

[95% Conf. Interval] 

prison 

.3265108 

.1672211 

1.95 

0.051 

-.0012366 .6542581 

clinic 

-1.00887 

.2148709 

-4.70 

0.000 

-1.430009 -.5877304 

dose 

-.0353962 

.0063795 

-5.55 

0.000 

-.0478997 -.0228926 








Stata 


Computer Appendix: Survival Analysis on the Computer 479 


The output is identical to that obtained from the stcox com¬ 
mand except that the regression coefficients are given by de¬ 
fault. The hr option for the cox command supplies the expo¬ 
nentiated coefficients. 

Notice in the previous output that the default method of han¬ 
dling ties (i.e., when multiple events happen at the same time) 
is the Breslow method. If you wish to use more exact methods 
you can use the exactp option (for the exact partial likelihood) 
or the exactm option (for the exact marginal likelihood) in the 
stcox or cox command. The exact methods are computation¬ 
ally more intensive and typically have a slight impact on the 
parameter estimates. However, if there are a lot of events that 
occur at the same time then exact methods are preferred. The 
code and output follow. 

stcox prison clinic dose,nohr exactm 
Cox regression -- exact marginal likelihood 


No. of subjects = 

238 

Number of obs 

= 238 

No. of failures = 

150 



Time at risk = 

95812 

LR chi2(3) 

= 64.56 

Log likelihood 

-666.3274 

Prob > chi2 

= 0.0000 


_t 

-d 

Coef. 

Std. Err. 

z 

P>|z| 

[95% Conf. 

Interval] 

prison 

.326581 

.1672306 

1.95 

0.051 

-.0011849 

.6543469 

clinic 

-1.009906 

.2148906 

-4.70 

0.000 

-1.431084 

-.5887285 

dose 

- .0353694 

.0063789 

-5.54 

0.000 

- .0478718 

-.0228669 


Suppose you are interested in running a Cox model with two 
interaction terms with PRISON. The generate command can 
be used to define new variables. The variables CLIN PR and 
CLIN DO are product terms that are defined from CLINIC x 
PRISON and CLINIC x DOSE. The code follows. 

generate clin_pr=clinic* prison 
generate clin_do=clinic*dose 

Type describe or list to see that the new variables are in the 
working dataset. 
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The following code runs the Cox model with the two interac¬ 
tion terms. 


stcox prison clinic dose clin pr clin do, nohr 
Cox regression -- Breslow method for ties 


No. of subjects = 

238 

Number of obs 

= 238 

No. of failures = 

150 



Time at risk = 

95812 

LR chi2(5) 

= 68.12 

Log likelihood 

-671.59969 

Prob > chi2 

= 0.0000 


-t 

-d 

Coef. 

Std. Err. 

z 

P>|z| 

[95% Conf. 

Interval] 

prison 

1.191998 

.5413685 

2.20 

0.028 

.1309348 

2.253061 

clinic 

.1746985 

.893116 

0.20 

0.845 

-1.575777 

1.925174 

dose 

-.0193175 

.01935 

-1.00 

0.318 

-.0572428 

.0186079 

clin.pr 

-.7379931 

.4314868 

-1.71 

0.087 

-1.583692 

.1077055 

clin.do 

-.0138608 

.0143275 

-0.97 

0.333 

-.0419422 

.0142206 


The lrtest command can be used to perform likelihood ratio 
tests. For example, to perform a likelihood ratio test on the 
two interaction terms CLIN_PR and CLIN_DO in the preced¬ 
ing model, we can save the —2 log likelihood statistic of the 
full model in the computer's memory by typing the following 
command. 

lrtest, saving(O) 

Now the reduced model (without the interaction terms) can 
be run (output omitted) by typing: 

stcox prison clinic dose 

After the reduced model is run, the following command pro¬ 
vides the results of the likelihood ratio test comparing the full 
model (with the interaction terms) to the reduced model. 


lrtest 
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The resulting output follows. 

Cox: likelihood-ratio test chi2(2) = 3.61 

Prob > chi2 = 0.1648 


The p-value of 0.1648 is not significant at the alpha = 0.05 
level. 


4. RUNNING A STRATIFIED COX MODEL 

If the proportional hazard assumption is not met for the vari¬ 
able CLINIC, but is met for the variables PRISON and DOSE, 
then a stratified Cox analysis can be performed. The stcox 
command can be used to run a stratified Cox model. The 
following code (with output) runs a Cox model stratified on 
CLINIC. 


stcox prison dose, strata(clinic) 


Stratified Cox regr. 

-- Breslow method 

for ties 


No. of subjects = 

238 

Number of obs 

= 238 

No. of failures = 

150 



Time at risk = 

95812 

LR chi2(2) 

= 33.94 

Log likelihood = 

-597.714 

Prob > chi2 

= 0.0000 


-t 







-d 

Haz. Ratio 

Std. Err. 

z 

P>|z| 

[95% Conf. 

Interval] 

prison 

1.475192 

.2491827 

2.30 

0.021 

1.059418 

2.054138 

dose 

.9654655 

.0062418 

-5.44 

0.000 

.953309 

.977777 


Stratified by clinic 


The strata() option allows up to five stratified variables. 

A stratified Cox model can be run including the two interac¬ 
tion terms. Recall that the generate command created these 
variables in the previous section. This model allows for the 
effect of PRISON and DOSE to differ for different values of 
CLINIC. The code and output follow. 
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stcox prison dose clin pr clin do, strata(clinic) nohr 


Stratified Cox regr. 

-- Breslow method 

for ties 


No. of subjects = 

238 

Number of obs 

= 238 

No. of failures = 

150 



Time at risk = 

95812 

LR chi2(4) 

= 35.81 

Log likelihood 

-596.77891 

Prob > chi2 

= 0.0000 


-t 

-d 

Coef. 

Std. Err. 

z 

P>|z| 

[95% Conf. 

Interval] 

prison 

1.087282 

.5386163 

2.02 

0.044 

.0316135 

2.142951 

dose 

- .0348039 

.0197969 

-1.76 

0.079 

-.0736051 

.0039973 

clin.pr 

-.584771 

.4281291 

-1.37 

0.172 

-1.423889 

.2543465 

clin.do 

-.0010622 

.014569 

-0.07 

0.942 

-.0296169 

.0274925 


Stratified by clinic 


Suppose we wish to estimate the hazard ratio for PRISON = 

1 vs. PRISON = 0 for CLINIC = 2. This hazard ratio can be 
estimated by exponentiating the coefficient for prison plus 

2 times the coefficient for the clinic-prison interaction term. 
This expression is obtained by substituting the appropriate 
values into the hazard in both the numerator (for PRISON = 
1) and denominator (for PRISON = 0) (see below). 

_ ^(Oexpflpj + fi 2 DOSE + (2)(1)(3 3 + (3 4 CLIN_DO] 
~ /toWexptOPj + fi 2 DOSE + (2)(0)(3 3 + |3 4 CL/N DO] 
= exp(|3j + 2 (3 3 ) 

The lincom command can be used to exponentiate linear 
combinations of parameters. Run this command directly af¬ 
ter running the model to estimate the HR for PRISON where 
CLINIC = 2. The code and output follow. 

lincom prison+2*clin pr, hr 


(1) 

prison + 2.0 

clin.pr = 0.0 



_t 

Haz. Ratio 

Std. Err. z p>|z| 

[95% Conf. 

Interval] 

(1) 

.9210324 

.3539571 -0.21 0.831 

.4336648 

1.956121 
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Models can also be inn on a subset portion of the data using 
the if statement. The following code (with output) runs a Cox 
model on the data where CLINIC = 2. 


stcox prison dose if clinic==2 
Cox regression -- Breslow method for ties 


No. of subjects = 

75 

Number of obs 

= 75 

No. of failures = 

28 



Time at risk 

36254 

LR chi2(2) 

= 9.70 

Log likelihood = 

-104.37135 

Prob > chi2 

= 0.0078 


-t 

-d 

Haz. Ratio 

Std. Err. 

z 

P>|z| 

[95% Conf. 

Interval] 

prison 

dose 

.9210324 

.9637452 

.3539571 

.0118962 

-0.21 

-2.99 

0.831 

0.003 

.4336648 

.9407088 

1.956121 

.9873457 


The hazard ratio estimates for PRISON = 1 vs. PRISON = 0 
(for CLINIC = 2) are exactly the same using the stratified Cox 
approach with product terms and the subset data approach 
(0.9210324). 

5. ASSESSING THE PH ASSUMPTION USING 
A STATISTICAL TEST 

The stphtest command can be used to perform a statistical 
test of the PH assumption. A statistical test gives objective 
criteria for assessing the PH assumption compared to using 
the graphical approach. This does not mean that this statisti¬ 
cal test is better than the graphical approach. It is just more 
objective. In fact, the graphical approach is generally more 
informative for descriptively characterizing the form of a PH 
violation. 

The command stphtest outputs a PH global test for all the 
covariates simultaneously and can also be used to obtain a 
test for each covariate separately with the detail option. To 
run these tests, you must obtain Schoenfeld residuals for the 
global test, and Schoenfeld scaled residuals for separate tests 
with each covariate. The idea behind the PH test is that if 
the PH assumption is satisfied, then the residuals should not 
be correlated with survival time (or ranked survival time). On 
the other hand, if the residuals tend to be positive for subjects 
who become events at a relatively early time and negative 
for subjects who become events at a relatively late time (or 
vice versa), then there is evidence that the hazard ratio is not 
constant over time (i.e., PH assumption is violated). 
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Before the stphtest can be implemented, the stcox com¬ 
mand needs to be inn to obtain the Schoenfeld residuals 
(with the schoenfeld() option) and the scaled Schoenfeld 
residuals (with the scaledsch() option). In the parentheses 
are the names of newly defined variables; schoen* creates 
SCHOEN1, SCHOEN2, and SCHOEN3 whereas scaled* cre¬ 
ates SCALED 1, SCALED2, and SCALED3. These variables 
contain the residuals for PRISON, DOSE, and CLINIC, re¬ 
spectively (the order that the variables were entered in the 
model). The user is free to type any variable name in the paren¬ 
theses. The Schoenfeld residuals are used for the global test 
and the scaled Schoenfeld residuals are used for the testing 
of the PH assumption for individual variables. 

stcox prison dose clinic, schoenfeld(schoen*) scaledsch(scaled*) 

Once the residuals are defined, the stphtest command can be 
run. The code and output follow. 

stphtest, rank detail 


Test of proportional hazards assumption 
Time: Rank(t) 



rho 

chi2 

df 

Prob>chi2 

prison 

-0.04645 

0.32 

1 

0.5689 

dose 

0.08975 

1.08 

1 

0.2996 

clinic 

-0.24927 

10.44 

1 

0.0012 

global test 


12.36 

3 

0.0062 


The tests suggest that the PH assumption is violated for 
CLINIC with the p-value at 0.0012. The tests do not suggest 
violation of the PH assumption for PRISON or DOSE. 

The plot() option of the stphtest command can be used to 
produce a plot of the scaled Schoenfeld residuals for CLINIC 
against survival time ranking. If the PH assumption is met, the 
fitted curve should look horizontal because the scaled Schoen¬ 
feld residuals would be independent of survival time. The code 
and graph follow. 


stphtest,rank plot(clinic) 
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Test of PH Assumption 



The fitted curve slopes slightly downward (not horizontally). 

6. OBTAINING COX ADJUSTED SURVIVAL CURVES 

Adjusted survival curves can be obtained with the sts graph 
command. Adjusted survival curves depend on the pattern 
of covariates. For example, the adjusted survival estimates 
for a subject with PRISON = 1, CLINIC = 1, and DOSE = 
40 are generally different than for a subject with PRISON = 
0, CLINIC = 2, and DOSE = 70. The sts graph command 
produces adjusted baseline survival curves. The following 
code produces an adjusted survival plot with PRISON = 0, 
CLINIC = 0, and DOSE = 0 (output omitted). 

sts graph, adjustfor(prison dose clinic) 

It is probably of more interest to create adjusted plots for rea¬ 
sonable patterns of covariates (CLINIC = 0 is not even a valid 
value). Suppose we are interested in graphing the adjusted 
survival curve for PRISON = 0, CLINIC = 2, and DOSE = 
70. We can create new variables with the generate command 
that can be used with the sts graph command. 

generate clinic2=clinic-2 
generate dose70=dose-70 

These variables (PRISON, CLINIC2, and DOSE70) produce 
the desired pattern of covariate when each is set to zero. The 
following code produces the desired results. 


sts graph, adjustfor(prison dose70 clinic2) 
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Survivor function 

adjusted for prison dose70 clinic2 



Adjusted stratified Cox survival curves can be obtained with 
the strataQ option. The following code creates two survival 
curves stratified by clinic (CLINIC = 1, PRISON = 0, and 
DOSE = 70) and (CLINIC = 2, PRISON = 0, and DOSE = 
70). 

sts graph, strata(clinic) adjustfor(prison dose70) 


Survivor functions, by clinic 
adjusted for prison dose70 



The adjusted curves suggest there is a strong effect from 
CLINIC on survival. 
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Suppose the interest is in comparing adjusted survival plots 
of PRISON = 1 to PRISON = 0 stratified by CLINIC. In this 
setting, the sts graph command cannot be used directly be¬ 
cause we cannot simultaneously define both levels of prison 
(PRISON = 1 and PRISON = 0) as the baseline level (recall 
sts graph plots only the baseline survival function). How¬ 
ever, survival estimates can be obtained using the sts gener¬ 
ate command twice; once where PRISON = 0 is defined as 
baseline and once where PRISON = 1 is defined as baseline. 
The following code creates variables containing the desired 
adjusted survival estimates. 

generate prisonl=prison-l 

sts generate scoxO=s, strata(clinic) adjustfor(prison dose70) 
sts generate scoxl=s, strata(clinic) adjustfor(prisonl dose70) 

The variables SCOX1 and SCOXO contain the survival esti¬ 
mates for PRISON = 1 and PRISON = 0, respectively, adjust¬ 
ing for dose and stratifying by clinic. The graph command is 
used to plot these estimates. If you are using a higher version 
of Stata than Stata 7.0 (e.g., Stata 8.0), then you should re¬ 
place the graph command with the graph7 command. The 
code and output follow. 

graph scoxO scoxl survt, twoway symbol([clinic] [clinic]) xlabel(365,730,1095) 


S(t+0), adjusted 


S(t+0), adjusted 


.009935 - 
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c. 


365 730 

survival time in days 


2 2 


1095 


We can also graph PRISON = 1 and PRISON = 0 with the 
data subset where CLINIC = 1. The option twoway requests 
a two-way scatterplot. The options symbol, xlabel, and title 
request the symbols, axis labels, and title, respectively. 
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graph7 scoxO scoxl survt if clinic==l, twoway symbol(ox) xlabel(365,730,1095) 
tl(“symbols O for prison=0, X for prison=l”) title(“subsetted by clinic==l”) 


symbols O for prison=0, x for prison=l 



.009935 - ~ 

- 1 - 1 - r~ 

365 730 1095 

survival time in days 
subset by clinic== 1 


7. RUNNING AN EXTENDED COX MODEL 

If the PH assumption is not satisfied, a possible strategy is to 
run a stratified Cox model. Another strategy is to run a Cox 
model with time-varying covariates (an extended Cox model). 
The challenge of running an extended Cox model is to choose 
the appropriate function of survival time to include in the 
model. 

Suppose we want to include a time-dependent covariate 
DOSE times the log of time. This product term could be appro¬ 
priate if the hazard ratio comparing any two levels of DOSE 
monotonically increases (or decreases) over time. The tvc op- 
tion() of the stcox command can be used to declare DOSE a 
time-varying covariate that will be multiplied by a function of 
time. The specification of that function of time is stated in the 
texp option with the variable t representing time. The code 
and output for a model containing the time-varying covariate, 
DOSE x ln(_t), follow. 
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stcox prison clinic dose, tvc(dose) texp( ln(_t)) nohr 
Cox regression -- Breslow method for ties 


No. of subjects = 

238 

Number of obs 

= 238 

No. of failures = 

150 



Time at risk 

95812 

LR chi2(4) 

= 66.29 

Log likelihood = 

-672.51694 

Prob > chi2 

= 0.0000 


_t 



_d 

Coef. 

Std. Err. 

z 

p>|z| [95% Conf. 

Interval] 

rh 

prison 

.3404817 

.1674672 

2.03 

0.042 .012252 

.6687113 


clinic 

-1.018682 

.215385 

-4.73 

0.000 -1.440829 

-.5965352 


dose 

-.0824307 

.0359866 

-2.29 

0.022 -.1529631 

-.0118982 

t 

dose 

.0085751 

.0064554 

1.33 

0.184 -.0040772 

.0212274 


note: second equation contains variables that continuously vary 
with respect to time; variables interact with current 
values of ln(.t). 


The parameter estimate for the time-dependent covariate 
DOSE x ln(_t) is .0085751, however, it is not statistically sig¬ 
nificant with a Wald test p-value of 0.184. 

A Heaviside function can also be used. The following code 
runs a model with a time-dependent variable equal to CLINIC 
if time is greater than or equal to 365 days and 0 other¬ 
wise. 

stcox prison dose clinic, tvc(clinic) texp(.t >=365) nohr 

Stata recognizes the expression (_t >= 365) as taking the value 
1 if survival time is >365 days and 0 otherwise. The output 
follows. 
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Cox regression -- Breslow method for ties 


No. of subjects 
No. of failures 
Time at risk 


238 Number of obs = 238 

150 
95812 


LR chi2(4) = 74.17 

Log likelihood = -668.57443 Prob > chi2 = 0.0000 



-t 

-d 

Coef. 

Std. Err. 

z 

P>|z| 

[95% Conf. 

Interval] 

rh 

prison 

.377704 

.1684024 

2.24 

0.025 

.0476414 

.7077666 


dose 

- .0355116 

.0064354 

-5.52 

0.000 

- .0481247 

-.0228985 


clinic 

-.4595628 

.2552911 

-1.80 

0.072 

-.959924 

.0407985 

t 

clinic 

-1.368665 

.4613948 

-2.97 

0.003 

-2.272982 

-.464348 


note: second equation contains variables that continuously vary 
with respect to time; variables interact with current 
values of.t> = 365. 


Unfortunately, the texp option can only be used once in the 
stcox command. This makes it more difficult to run the equiv¬ 
alent model with two Heaviside functions. However, it can be 
accomplished using the stsplit command, which adds extra 
observations to the working dataset. The following code cre¬ 
ates a variable called VI and adds new observations to the 
dataset. 


stsplit vl, at(365) 

After the above stsplit command is executed, any subject fol¬ 
lowed more than 365 days is represented by two observations 
rather than one. For example, the first subject (ID = 1) had 
an event on the 428th day; the first observation for that sub¬ 
ject shows no event between 0 and 365 days whereas the sec¬ 
ond observation shows an event on the 428th day. The newly 
defined variable vl has the value 365 for observations with 
survival time exceeding or equal to 365 and 0 otherwise. The 
following code lists the first 10 observations for the requested 
variables (output follows). 
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list id _t0 _t _d clinic vl in 1/10 



id 

.to 

_t 

_d 

clinic 

vl 

1 . 

1 

0 

365 

0 

1 

0 

2. 

1 

365 

428 

1 

1 

365 

3. 

2 

0 

275 

1 

1 

0 

4. 

3 

0 

262 

1 

1 

0 

5. 

4 

0 

183 

1 

1 

0 

6. 

5 

0 

259 

1 

1 

0 

7. 

6 

0 

365 

0 

1 

0 

8. 

6 

365 

714 

1 

1 

365 

9. 

7 

0 

365 

0 

1 

0 

10. 

7 

365 

438 

1 

1 

365 


With the data in this form, two Heaviside functions can actu¬ 
ally be defined in the data using the following code. 

generate hv2=clinic*(vl/365) 
generate hvl=clinic*(l-(vl/365)) 

The following code and output list a sample of the observa¬ 
tions (in 159/167) with the observation number suppressed 
(the noobs option). 

list id tO t clinic vl hvl hv2 in 159/167, noobs 


id 

.to 

-t 

clinic 

vl 

hvl 

hv2 

100 

0 

365 

1 

0 

1 

0 

100 

365 

749 

1 

365 

0 

1 

101 

0 

150 

1 

0 

1 

0 

102 

0 

365 

1 

0 

1 

0 

102 

365 

465 

1 

365 

0 

1 

103 

0 

365 

2 

0 

2 

0 

103 

365 

708 

2 

365 

0 

2 

104 

0 

365 

2 

0 

2 

0 

104 

365 

713 

2 

365 

0 

2 


With the two Heaviside functions defined in the split data, a 
time-dependent model using these functions can be run with 
the following code (the output follows). 
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stcox prison clinic dose hvl hv2, nohr 


No. of subjects = 

238 

Number of obs 

= 360 

No. of failures = 

150 



Time at risk = 

95812 

LR chi2(4) 

= 74.17 

Log likelihood = 

-668.57443 

Prob > chi2 

= 0.0000 


-t 

-d 

Coef. 

Std. Err. 

z 

P>|z| 

[95% Conf. 

Interval] 

prison 

.377704 

.1684024 

2.24 

0.025 

.0476414 

.7077666 

dose 

-.0355116 

.0064354 

-5.52 

0.000 

-.0481247 

-.0228985 

hvl 

- .4595628 

.2552911 

-1.80 

0.072 

-.959924 

.0407985 

hv2 

-1.828228 

.385946 

-4.74 

0.000 

-2.584668 

-1.071788 


The stsplit command is complicated but it offers a powerful 
approach for manipulating the data to accommodate time- 
varying analyses. 

If you wish to return the data to their previous form, drop the 
variables that were created from the split and then use the 
stjoin command: 

drop vl hvl hv2 
stjoin 

It is possible to split the data at every single failure time, but 
this uses a large amount of memory. However, if there is only 
one time-varying covariate in the model, the simplest way 
to run an extended Cox model is by using the tvc and texp 
options with the stcox command. 

One should not confuse an individual’s survival time variable 
(the outcome variable) with the variable used to define the 
time-dependent variable ( t in Stata). The individual’s survival 
time variable is a time-independent variable. The time of the 
individual's event (or censorship) does not change. A time- 
dependent variable, on the other hand, is defined so that it 
can change its values over time. 

8. RUNNING PARAMETRIC MODELS 

The Cox PH model is the most widely used model in survival 
analysis. A key reason why it is so popular is that the dis¬ 
tribution of the survival time variable need not be specified. 
However, if it is believed that survival time follows a partic¬ 
ular distribution, then that information can be utilized in a 
parametric modeling of survival data. 
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Many parametric models are accelerated failure time (AFT) 
models. Whereas the key assumption of a PH model is that 
hazard ratios are constant over time, the key assumption for 
an AFT model is that survival time accelerates (or deceler¬ 
ates) by a constant factor when comparing different levels of 
covariates. 

The most common distribution for parametric modeling of 
survival data is the Weibull distribution. The Weibull distri¬ 
bution has the desirable property that if the AFT assump¬ 
tion holds, then the PH assumption also holds. The exponen¬ 
tial distribution is a special case of the Weibull distribution. 
The key property for the exponential distribution is that the 
hazard is constant over time (not just the hazard ratio). The 
Weibull and exponential model can be run as a PH model 
(the default) or an AFT model. 

A graphical method for checking the validity of a Weibull as¬ 
sumption is to examine Kaplan-Meier log-log survival curves 
against log survival time. This is accomplished with the sts 
graph command (see Section 2 of this appendix). If the plots 
are straight lines, then there is evidence that the distribution 
of survival times follows a Weibull distribution. If the slope of 
the line equals one, then the evidence suggests that survival 
time follows an exponential distribution. 

The streg command is used to run parametric models. Even 
though the log-log survival curves obtained using the addicts 
dataset are not straight lines, the data are used for illustration. 
First a parametric model using the exponential distribution 
is demonstrated. The code and output follow. 

streg prison dose clinic, dist(exponential) nohr 
Exponential regression -- log relative-hazard form 


No. of subjects = 

238 

Number of obs 

= 238 

No. of failures = 

150 



Time at risk = 

95812 

LR chi2(3) 

= 49.91 

Log likelihood 

-270.47929 

Prob > chi2 

= 0.0000 


-t 

Coef. 

Std. Err. 

z 

P>|z| 

[95% Conf. 

Interval] 

prison 

.2526491 

.1648862 

1.53 

0.125 

-.070522 

.5758201 

dose 

- .0289167 

.0061445 

-4.71 

0.000 

- .0409596 

-.0168738 

clinic 

-.8805819 

.210626 

-4.18 

0.000 

-1.293401 

-.4677625 

.cons 

-3.684341 

.4307163 

-8.55 

0.000 

-4.528529 

-2.840152 
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The distribution is specified with the dist() option. The stcurv 
command can be used following the streg command to ob¬ 
tain fitted survival, hazard, or cumulative hazard curves. The 
following code obtains the estimated hazard function for 
PRISON = 0, DOSE = 40, and CLINIC = 1. 

stcurv, hazard at (prison=0 dose=40 clinic=l) 


0 "S 
-a a 


o 

II 

O 


1.00327 i 


-.996726 


ox o o o o 


1076 


analysis time 
Exponential regression 


The graph illustrates the fact that the hazard is constant over 
time if survival time follows an exponential distribution. 


Next a Weibull distribution is run using the streg command, 
streg prison dose clinic, dist(weibull) nohr 


Weibull regression -- log relative-hazard form 


No. of subjects 
No. of failures 
Time at risk 


238 

150 

95812 


Log likelihood 


-260.98467 


Number of obs 


LR chi2(3) 
Prob > chi2 


238 


60.89 

0.0000 


-t 

Coef. 

Std. Err. 

z 

P>|z| 

[95% Conf. 

Interval] 

prison 

.3144143 

.1659462 

1.89 

0.058 

-.0108342 

.6396628 

dose 

-.0334675 

.006255 

-5.35 

0.000 

-.0457272 

-.0212079 

clinic 

-.9715245 

.2122826 

-4.58 

0.000 

-1.387591 

-.5554582 

.cons 

-5.624436 

.6588041 

-8.54 

0.000 

-6.915668 

-4.333203 

/ln-p 

.3149526 

.0675583 

4.66 

0.000 

.1825408 

.4473644 

P 

1.370194 

.092568 



1.200263 

1.564184 

1/P 

.7298235 

.0493056 



.6393109 

.8331507 
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Notice that the Weibull output has a parameter p that the ex¬ 
ponential distribution does not have. The hazard function for 
a Weibull distribution is Apt p_1 . If p = 1 then the Weibull dis¬ 
tribution is also an exponential distribution (h(t) = A.). Hazard 
ratio parameters are given by default for the Weibull distri¬ 
bution. If you want the parameterization for an AFT model, 
then use the time option. 

The code and output for a Weibull AFT model follow. 


streg prison dose clinic,dist(weibull) time 


Weibull regression 

-- accelerated 

failure-time form 


No. of subjects = 

238 

Number of obs 

= 238 

No. of failures = 

150 



Time at risk 

95812 

LR chi2(3) 

= 60.89 

Log likelihood = 

-260.98467 

Prob > chi2 

= 0.0000 


-t 

Coef. 

Std. Err. 

z 

P>|z| 

[95% Conf. 

Interval] 

prison 

-.2294669 

.1207889 

-1.90 

0.057 

-.4662088 

.0072749 

dose 

.0244254 

.0045898 

5.32 

0.000 

.0154295 

.0334213 

clinic 

.7090414 

.1572246 

4.51 

0.000 

.4008867 

1.017196 

.cons 

4.104845 

.3280583 

12.51 

0.000 

3.461863 

4.747828 

/ln.p 

.3149526 

.0675583 

4.66 

0.000 

.1825408 

.4473644 

P 

1.370194 

.092568 



1.200263 

1.564184 

1/P 

.7298235 

.0493056 



.6393109 

.8331507 


The relationship between the hazard ratio parameter and 
the AFT parameter a.j is (3j = — <Xjp. For example, using the 
coefficient estimates for PRISON in the Weibull PH and AFT 
models yields the relationship 0.3144 = (—0.2295)(1.37). 

The stcurv can again be used following the streg command 
to obtain fitted survival, hazard, or cumulative hazard curves. 
The following code obtains the estimated hazard function for 
PRISON = 0, DOSE = 40, and CLINIC = 1. 


stcurv, hazard at (prison=0 dose=40 clinic=l) 
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analysis time 
Weibull regression 


The plot of the hazard is monotonically increasing. With a 
Weibull distribution, the hazard is constrained such that it 
cannot increase and then decrease. This is not the case with 
the log-logistic distribution as demonstrated in the next exam¬ 
ple. The log-logistic model is not a PH model, so the default 
model for the streg command is an AFT model. The code and 
output follow. 

streg prison dose clinic, dist(loglogistic) 


Log-logistic regression -- accelerated failure-time form 


No. of 

subjects = 

238 


Number of obs 

= 238 

No. of 

failures = 

150 





Time at 

risk = 

95812 


LR 

chi2(3) 

= 52.18 

Log likelihood = 

-270.42329 


Prob > chi2 

= 0.0000 

-t 

Coef. 

Std. Err. 

z 

P>|z| 

[95% Conf. 

Interval] 

prison 

-.2912719 

.1439646 

-2.02 

0.043 

-.5734373 

-.0091065 

dose 

.0316133 

.0055192 

5.73 

0.000 

.0207959 

.0424307 

clinic 

.5805977 

.1715695 

3.38 

0.001 

.2443276 

.9168677 

.cons 

3.563268 

.3894467 

9.15 

0.000 

2.799967 

4.32657 

/In .gam 

- .5331424 

.0686297 

-7.77 

0.000 

-.6676541 

-.3986306 

gamma 

.5867583 

.040269 



.5129104 

.6712386 


Note that Stata calls the shape parameter gamma for a log- 
logistic model. The code to produce the graph of the haz¬ 
ard function for PRISON = 0, DOSE = 40, and CLINIC = 1 
follows. 


stcurv, hazard at (prison=0 dose=40 clinic=l) 
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analysis time 
Log-logistic regression 


The hazard function (in contrast to the Weibull hazard func¬ 
tion) first increases and then decreases. 

The corresponding survival curve for the log-logistic distribu¬ 
tion can also be obtained with the stcurve command. 


stcurv, survival at (prison=0 dose=40 clinic=l) 



analysis time 
Log-logistic regression 


If the AFT assumption holds for a log-logistic model, then the 
proportional odds assumption holds for the survival function 
(although the PH assumption would not hold). The propor¬ 
tional odds assumption can be evaluated by plotting the log 
odds of survival (using KM estimates) against the log of sur¬ 
vival time. If the plots are straight lines for each pattern of 
covariates, then the log-logistic distribution is reasonable. If 
the straight lines are also parallel, then the proportional odds 
and AFT assumptions also hold. The following code will plot 
the estimated log odds of survival against the log of time by 
CLINIC (output omitted). 
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sts generate skm=s,by (clinic) 
generate logodds=ln(skm/(l-skm)) 
generate logt=ln(survt) 

graph7 logodds logt,twoway symbol)[clinic] [clinic]) 

Another context for thinking about the proportional odds as¬ 
sumption is that the odds ratio estimated by a logistic re¬ 
gression does not depend on the length of the follow-up. For 
example, if a follow-up study was extended from three to five 
years then the underlying odds ratio comparing two patterns 
of covariates would not change. If the proportional odds as¬ 
sumption is not true, then the odds ratio is specific to the 
length of follow-up. 

Both the log-logistic and Weibull models contain an extra 
shape parameter that is typically assumed constant. This as¬ 
sumption is necessary for the PH or AFT assumption to hold 
for these models. Stata provides a way of modeling the shape 
parameter as a function of predictor variables by use of the 
ancillary option in the streg command (see Chapter 7 un¬ 
der the heading, “Other Parametric Models”). The following 
code runs a log-logistic model in which the shape parameter 
gamma is modeled as a function of CLINIC and X is modeled 
as a function of PRISON and DOSE. 

streg prison dose, dist(loglogistic) ancillary(clinic) 

The output follows. 


Log-logistic regression -- accelerated failure-time form 


No. of subjects 

238 

Number of obs 

= 238 

No. of failures 

150 



Time at risk 

95812 

LR chi2(2) 

= 38.87 

Log likelihood 

-272.65273 

Prob > chi2 

= 0.0000 



_t 

Coef. 

Std. Err. 

Z 

P>|z| 

[95% Conf. 

Interval] 

-t 

prison 

- .3275695 

.1405119 

-2.33 

0.020 

-.6029677 

-.0521713 


dose 

.0328517 

.0054275 

6.05 

0.000 

.022214 

.0434893 


.cons 

4.183173 

.3311064 

12.63 

0.000 

3.534216 

4.83213 

ln.gam 

clinic 

.4558089 

.1734819 

2.63 

0.009 

.1157906 

.7958273 


.cons 

-1.094496 

.2212143 

-4.95 

0.000 

-1.528068 

-.6609238 
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Notice there is a parameter estimate for CLINIC as well as 
an intercept ( cons) under the heading In gam (the log of 
gamma). With this model, the estimate for gamma depends on 
whether CLINIC = 1 or CLINIC = 2. There is no easy interpre¬ 
tation for the predictor variables in this type of model, which 
is why it is not commonly used. However, for any specified 
value of PRISON, DOSE, and CLINIC, the hazard and sur¬ 
vival functions can be estimated by substituting the parame¬ 
ter estimates into the expressions for the log-logistic hazard 
and survival functions. 

Other distributions supported by streg are the generalized 
gamma, the lognormal, and the Gompertz distributions. 


9. RUNNING FRAILTY MODELS 

Frailty models contain an extra random component designed 
to account for individual level differences in the hazard oth¬ 
erwise unaccounted for by the model. The frailty a is a multi¬ 
plicative effect on the hazard assumed to follow some distri¬ 
bution. The hazard function conditional on the frailty can be 
expressed as h(t|<x) = a[h(t)]. 

Stata offers two choices for the distribution of the frailty: the 
gamma and the inverse-Gaussian, both of mean 1 and vari¬ 
ance theta. The variance (theta) is a parameter estimated by 
the model. If theta = 0, then there is no frailty. 

For the first example, a Weibull PH model is run with 
PRISON, DOSE, and CLINIC as predictors. A gamma dis¬ 
tribution is assumed for the frailty component. The code 
follows. 

streg dose prison clinic, dist(weibull) frailty(gamma) nohr 

The frailtyO option requests that a frailty model be run. The 
output follows. 
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Weibull 

regression 

-- log relative-hazard form 




Gamma frailty 




No. of 

subjects = 

238 


Number of obs 

= 238 

No. of 

failures = 

150 





Time at 

risk 

95812 


LR 

chi2(4) 

= 60.89 

Log likelihood 

-260.98467 


Prob > chi2 

= 0.0000 

-t 

Coef. 

Std. Err. 

z 

P>|z| 

[95% Conf. 

Interval] 

dose 

- .0334635 

.0062553 

-5.35 

0.000 

-.0457237 

-.0212034 

prison 

.3143786 

.165953 

1.89 

0.058 

- .0108833 

.6396405 

clinic 

-.9714998 

.2122909 

-4.58 

0.000 

-1.387582 

-.5554173 

.cons 

-5.624342 

.6588994 

-8.54 

0.000 

-6.915761 

-4.332923 

/ln-p 

.3149036 

.0675772 

4.66 

0.000 

.1824548 

.4473525 

/ln.the 

-15.37947 

722.4246 

-0.02 

0.983 

-1431.306 

1400.547 

P 

1.370127 

.0925893 



1.20016 

1.564166 

1/P 

.7298592 

.0493218 



.6393185 

.8332223 

theta 

2.09e-07 

.0001512 



0 



Likelihood ratio test of theta = 0: chibar2(01) = 0.00 
Prob>=chibar2 = 1.000 


Notice that there is one additional parameter (theta) com¬ 
pared to the model run in the previous section. The estimate 
for theta is 2.09 times 10~ 7 or 0.000000209 which is essen¬ 
tially zero. A likelihood ratio test for the inclusion of theta is 
provided at the bottom of the output and yields a chi-square 
value of 0.00 and a p-value of 1.000. The frailty has no effect 
on the model and need not be included. 

The next model is the same as the previous one except that 
CLINIC is not included. One might expect a frailty component 
to play a larger role if an important covariate, such as CLINIC, 
is not included in the model. The code and output follow. 
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streg dose prison, dist(weibull) frailty(gamma) nohr 


Weibull regression -- log relative-hazard form 




Gamma frailty 




No. of 

subjects = 

238 


Number of obs 

= 238 

No. of 

failures = 

150 





Time at 

risk 

95812 


LR 

chi2(3) 

= 36.00 

Log likelihood 

-273.42782 


Prob > chi2 

= 0.0000 

-t 

Coef. 

Std. Err. 

z 

P>|z| 

[95% Conf. 

Interval] 

dose 

-.0358231 

.010734 

-3.34 

0.001 

-.0568614 

- .0147849 

prison 

.2234556 

.2141028 

1.04 

0.297 

-.1961783 

.6430894 

.cons 

-6.457393 

.6558594 

-9.85 

0.000 

-7.742854 

-5.171932 

/ln-p 

.2922832 

.1217597 

2.40 

0.016 

.0536385 

.5309278 

/ln.the 

-2.849726 

5.880123 

-0.48 

0.628 

-14.37456 

8.675104 

P 

1.339482 

.163095 



1.055103 

1.700509 

1/P 

.7465571 

.0909006 



.5880591 

.9477747 

theta 

.0578602 

.340225 



5.72e-07 

5855.31 

Likelihood ratio test of theta 

= 0: 

chibar2(01) = 0.03 


Prob>= 

=chibar2 = 0 

.432 






The variance (theta) of the frailty is estimated at 0.0578602. 
Although this estimate is not exactly zero as in the previous 
example, the p-value for the likelihood ratio test for theta is 
nonsignificant at 0.432. So the addition of frailty did not ac¬ 
count for CLINIC being omitted from the model. 

Next the same model is run except that the inverse-Gaussian 
distribution is used for the frailty rather than the gamma dis¬ 
tribution. The code and output follow. 
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streg dose prison, dist(weibull) frailty(invgaussian) nohr 

Weibull regression -- log relative-hazard form 

Inverse-Gaussian frailty 


No. of subjects = 
No. of failures = 
Time at risk = 

Log likelihood 

238 

150 

95812 

-273.43201 


Number of obs 

LR chi2(3) 

Prob > chi2 

= 238 

= 35.99 

= 0.0000 

-t 

Coef. 

Std. Err. 

z 

P>|z| 

[95% Conf. 

Interval] 

dose 

- .0353908 

.0096247 

-3.68 

0.000 

-.0542549 

-.0165268 

prison 

.2166456 

.1988761 

1.09 

0.276 

-.1731445 

.6064356 

.cons 

-6.448779 

.6494397 

-9.93 

0.000 

-7.721658 

-5.175901 

/ln-p 

.2875567 

.1122988 

2.56 

0.010 

.0674551 

.5076583 

/ln.the 

-3.137696 

7.347349 

-0.43 

0.669 

-17.53824 

11.26284 

P 

1/P 

theta 

1.333166 

.7500941 

.0433827 

.1497129 

.0842347 

.3187475 



1.069782 

.6019034 

2.42e-08 

1.661396 

.9347697 

77873.78 


Likelihood ratio test of theta = 0: chibar2(01) = 0.02 
Prob>=chibar2 = 0.443 


The p-value for the likelihood ratio test for theta is 0.443 (at 
the bottom of the output). The results in this example are very 
similar whether assuming the inverse-Gaussian or the gamma 
distribution for the frailty component. 

An example of shared frailty applied to recurrent event data 
is shown in the next section. 


10. MODELING RECURRENT EVENTS 

The modeling of recurrent events is illustrated with the blad¬ 
der cancer dataset (bladder.dta) described at the start of this 
appendix. Recurrent events are represented in the data with 
multiple observations for subjects having multiple events. The 
data layout for the bladder cancer dataset is suitable for a 
counting process approach with time intervals defined for 
each observation (see Chapter 8). The following code prints 
the 12th through 20th observations, which contain informa¬ 
tion for four subjects. The code and output follow. 
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list in 12/20 



id 

event 

interval 

start 

stop 

tx 

num 

size 

12. 

10 

1 

1 

0 

12 

0 

1 

1 

13. 

10 

1 

2 

12 

16 

0 

1 

1 

14. 

10 

0 

3 

16 

18 

0 

1 

1 

15. 

11 

0 

1 

0 

23 

0 

3 

3 

16. 

12 

1 

1 

0 

10 

0 

1 

3 

17. 

12 

1 

2 

10 

15 

0 

1 

3 

18. 

12 

0 

3 

15 

23 

0 

1 

3 

19. 

13 

1 

1 

0 

3 

0 

1 

1 

20. 

13 

1 

2 

3 

16 

0 

1 

1 


There are three observations for ID = 10, one observation for 
ID = 11, three observations for ID = 12, and two observations 
for ID = 13. The variables START and STOP represent the 
time interval for the risk period specific to that observation. 
The variable EVENT indicates whether an event (coded 1) 
occurred. The first three observations indicate that the subject 
with ID = 10 had an event at 12 months, another event at 16 
months, and was censored at 18 months. 

Before using Statas survival commands, the stset command 
must be used to define the key survival variables. The code 
follows. 

stset stop, failure(event==l) id(id) timeO(start) exit(time .) 

We have previously used the stset command on the addicts 
dataset, but more options from stset are included here. The 
id() option defines the subject variable (i.e., the cluster vari¬ 
able), the time0() option defines the variable that begins the 
time interval, and the exit(time .) option tells Stata that there 
is no imposed limit on the length of follow-up time for a given 
subject (e.g., subjects are not out of the risk set after their 
first event). With the stset command, Stata creates the vari¬ 
ables _t0, _t, and _d, which Stata automatically recognizes 
as survival variables representing the time interval and event 
status. Actually the time0( ) option could have been omitted 
from this stset command and by default Stata would have 
created the starting time variable _t0 in the correct counting 
process format as long as the id() option was used (otherwise 
tO would default to zero). The following code (and output) 
lists the 12th through 20th observation with the newly created 
variables. 
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list id _tO _t _d tx in 12/20 



id 

.to 

_t 

_d 

tx 

12. 

10 

0 

12 

1 

0 

13. 

10 

12 

16 

1 

0 

14. 

10 

16 

18 

0 

0 

15. 

11 

0 

23 

0 

0 

16. 

12 

0 

10 

1 

0 

17. 

12 

10 

15 

1 

0 

18. 

12 

15 

23 

0 

0 

19. 

13 

0 

3 

1 

0 

20. 

13 

3 

16 

1 

0 


A Cox model with recurrent events using the counting process 
approach can now be run with the stcox command. The pre¬ 
dictors are treatment status (TX), initial number of tumors 
(NUM), and the initial size of tumors (SIZE). The robust op¬ 
tion requests robust standard errors for the coefficient esti¬ 
mates. Omit the nohr option if you want the exponentiated 
coefficients. The code and output follow. 

stcox tx num size, nohr robust 
Cox regression — Breslow method for ties 


No. of subjects = 

85 

Number of obs 

= 190 

No. of failures = 

112 



Time at risk = 

2711 

Wald chi2(3) 

= 11.25 

Log likelihood 

-460.07958 

Prob > chi2 

= 0.0105 


(standard errors adjusted for clustering on id) 


-t 

_d 

Coef. 

Robust 

Std. Err. 

z 

P>|z| 

[95% Conf. 

Interval] 

tx 

- .4070966 

.2432658 

-1.67 

0.094 

-.8838889 

.0696956 

num 

.1606478 

.0572305 

2.81 

0.005 

.0484781 

.2728174 

size 

- .0400877 

.0726459 

-0.55 

0.581 

-.182471 

.1022957 


The interpretation of these parameter estimates is discussed 
in Chapter 8. 
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A stratified Cox model can also be run using the data in this 
format with the variable INTERVAL as the stratified variable. 
The stratified variable indicates whether subjects were at risk 
for their 1 st, 2nd, 3rd, or 4th event. This approach is called 
conditional 1 in Chapter 8 and is used if the investigator 
wants to distinguish the order in which recurrent events oc¬ 
cur. The code and output follow. 

stcox tx num size, nohr robust strata(interval) 


stratified Cox regr. — Breslow method for ties 


No. 

of subjects = 

85 

Number of obs 

190 

No. 

of failures = 

112 



Time 

i at risk = 

2711 

Wald chi2(3) = 

7.11 

Log 

likelihood = -319 

.85912 

Prob > chi2 

0.0685 


(standard 

errors 

adjusted for clustering 

on id) 


_t 

.d 

Coef. 

Robust 

Std. Err. 

z 

P>|z| 

[95% Conf. 

Interval] 

tx 

-.3342955 

.1982339 

-1.69 

0.092 

-.7228268 

.0542359 

num 

.1156526 

.0502089 

2.30 

0.021 

.017245 

.2140603 

size 

- .0080508 

.0604807 

-0.13 

0.894 

-.1265908 

.1104892 


Stratified by interval 


Interaction terms between the treatment variable (TX) and 
the stratified variable could be created to examine whether 
the effect of treatment differed for the 1st, 2nd, 3rd, or 4th 
event. (Note that in this dataset subjects have a maximum of 
4 events). 

Another stratified approach (called conditional 2) is a slight 
variation of the conditional 1 approach. The difference is in 
the way the time intervals for the recurrent events are defined. 
There is no difference in the time intervals when subjects are 
at risk for their first event. However, with the conditional 2 
approach, the starting time at risk gets reset to zero for each 
subsequent event. The following code creates data suitable 
for using the conditional 2 approach. 


generate stop2 =_t - _t0 

stset stop2, failure(event==l) exit(time .) 
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The generate command defines a new variable called STOP2 
representing the length of the time interval for each observa¬ 
tion. The stset command is used with STOP2 as the outcome 
variable (_t). By default Stata sets the variable _t0 to zero. 
The following code (and output) lists the 12th through 20th 
observations for selected variables. 

list id _t0 _t _d tx in 12/20 



id 

.to 

_t 

-d 

tx 

12. 

10 

0 

12 

1 

0 

13. 

10 

0 

4 

1 

0 

14. 

10 

0 

2 

0 

0 

15. 

11 

0 

23 

0 

0 

16. 

12 

0 

10 

1 

0 

17. 

12 

0 

5 

1 

0 

18. 

12 

0 

8 

0 

0 

19. 

13 

0 

3 

1 

0 

20. 

13 

0 

13 

1 

0 


Notice that the id() option was not used with the stset com¬ 
mand for the conditional 2 approach. This means that Stata 
does not know that multiple observations correspond to the 
same subject. However the cluster() option can be used di¬ 
rectly in the stcox command to request that the analysis be 
clustered by ID (i.e., by subject). The following code runs a 
stratified Cox model using the conditional 2 approach with the 
cluster() and robust options. The code and output follow. 


stcox tx num size, nohr robust strata(interval) cluster(id) 


Stratified Cox regr. — Breslow method for ties 


No. of subjects 
No. of failures 
Time at risk 


190 

112 

2711 


Log likelihood = -363.16022 

(standard errors 


Number of obs = 

190 

Wald chi2(3) = 

11.99 

Prob > chi2 = 

0.0074 

adjusted for clustering 

on id) 


-t 

-d 

Coef. 

Robust 

Std. Err. 

z 

P>|z| 

[95% Conf. 

Interval] 

tx 

- .2695213 

.2093108 

-1.29 

0.198 

-.6797628 

.1407203 

num 

.1535334 

.0491803 

3.12 

0.002 

.0571418 

.2499249 

size 

.0068402 

.0625862 

0.11 

0.913 

-.1158265 

.129507 


Stratified by interval 
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The results using the conditional 1 and conditional 2 ap¬ 
proaches vary slightly. 

Next we demonstrate how a shared frailty model can be ap¬ 
plied to recurrent event data. Frailty is included in recurrent 
event analyses to account for variability due to unobserved 
subject specific factors that may lead to within-subject corre¬ 
lation. 

Before running the model, we rerun the stset command 
shown earlier in this section to get the data back to the form 
suitable for a counting process approach. The code follows. 

stset stop, failure(event==l) id(id) timeO(start) exit(time .) 

Next a parametric Weibull model is run with a gamma dis¬ 
tributed shared frailty component using the streg command. 
We use the same three predictors for comparability with the 
other models presented in this section. The code follows. 

streg tx num size,dist(weibull) frailty(gamma) shared(id) nohr 

The dist() option requests the distribution for the parametric 
model. The frailtyO option requests the distribution for the 
frailty and the shared() option defines the cluster variable, 
ID. For this model, observations from the same subject share 
the same frailty. The output follows. 


Weibull regression — log relative-hazard form 

Gamma shared frailty 


No. of subjects 

85 

Number of obs 

= 190 

No. of failures 

112 



Time at risk 

2711 

LR chi2(3) 

= 8.04 

Log likelihood 

-184.73658 

Prob > chi2 

= 0.0453 


_t 

Coef. 

Std. Err. 

Z 

P=jz| 

[95% Conf. 

Interval] 

tx 

-.4583219 

.2677275 

-1.71 

0.087 

-.9830582 

.0664143 

num 

.1847305 

.0724134 

2.55 

0.011 

.0428028 

.3266581 

size 

-.0314314 

.0911134 

-0.34 

0.730 

-.2100104 

.1471476 

.cons 

-2.952397 

.4174276 

-7.07 

0.000 

-3.77054 

-2.134254 

/ln-p 

-.1193215 

.0898301 

-1.33 

0.184 

-.2953852 

.0567421 

/In .the 

-.7252604 

.5163027 

-1.40 

0.160 

-1.737195 

.2866742 

P 

.8875224 

.0797262 



.7442449 

1.058383 

1/P 

1.126732 

.1012144 



.9448377 

1.343644 

theta 

.4841985 

.249993 



.1760134 

1.33199 


Likelihood ratio test of theta=0: chibar2(01) = 7.34 
Prob>=chibar2 = 0.003 
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The model output is discussed in Chapter 8. 

The counting process data layout with multiple observations 
per subject need not only apply to recurrent event data but 
can also be used for a more conventional survival analysis in 
which each subject is limited to one event. A subject with four 
observations may be censored for the first three observations 
before getting the event in the time interval represented by the 
fourth observation. This data layout is particularly suitable 
for representing time-varying exposures, which may change 
values over different intervals of time (see the stsplit com¬ 
mand in Section 7 of this appendix). 


B. SAS 

Analyses are carried out in SAS by using the appropriate SAS 
procedure on a SAS dataset. The key SAS procedures for per¬ 
forming survival analyses are as follows. 

PROC LIFETEST. This procedure is used to obtain Kaplan- 
Meier survival estimates and plots. It can also be used to output 
life table estimates and plots. It will generate output for the 
log rank and Wilcoxon test statistics if stratifying by a covari¬ 
ate. A new SAS dataset containing survival estimates can be 
requested. 

PROC PHREG. This procedure is used to run the Cox propor¬ 
tional hazards model, a stratified Cox model, and an extended 
Cox model with time-varying covariates. It can also be used to 
create a SAS dataset containing adjusted survival estimates. 
These adjusted survival estimates can then be plotted using 
PROC GPLOT. 

PROC LIFEREG. This procedure is used to run parametric ac¬ 
celerated failure time (AFT) models. 

Analyses on the addicts dataset are used to illustrate these pro¬ 
cedures. The addicts dataset was obtained from a 1991 Aus¬ 
tralian study by Caplehom et ah, and contains information 
on 238 heroin addicts. The study compared two methadone 
treatment clinics to assess patient time remaining under 
methadone treatment. The two clinics differed according to 
live-in policies for patients. A patient’s survival time was de¬ 
termined as the time (in days) until the person dropped out 
of the clinic or was censored. The variables are defined at the 
start of this appendix. 
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All of the SAS programming code is written in capital letters 
for readability. However, SAS is not case sensitive. If a pro¬ 
gram is written with lowercase letters, SAS reads them as up¬ 
percase. The number of spaces between words (if more than 
one) has no effect on the program. Each SAS programming 
statement ends with a semicolon. 

The addicts dataset is stored as a permanent SAS dataset 
called addicts.sas7bdat. A LIBNAME statement is needed 
to indicate the path to the location of the SAS dataset. In our 
examples, we assume the file is located on the A drive (i.e., on 
a disk). The LIBNAME statement includes a reference name 
as well as the path. We call the reference name REE The code 
is as follows. 

LIBNAME REF ‘A:\’; 

The user is free to define his or her own reference name. The 
path to the location of the file is given between the quotation 
marks. The general form of the code is 

LIBNAME Your reference name 'Your path to file location'; 

PROC CONTENTS, PROC PRINT, PROC UNIVARIATE, 
PROC FREQ, and PROC MEANS can be used to list or de¬ 
scribe the data. SAS code can be run in one batch or high¬ 
lighted and submitted one procedure at a time. Code can be 
submitted by clicking on the submit button on the toolbar 
in the editor window. The code for using these procedures 
follows (output omitted). 

PROC CONTENTS DATA=REF.ADDICTS;RUN; 

PROC PRINT DATA=REF.ADDICTS;RUN; 

PROC UNIVARIATE DATA=REF.ADDICTS;VAR SURVT;RUN; 

PROC FREQ DATA=REF.ADDICTS;TABLES CLINIC PRISON;RUN; 

PROC MEANS DATA=REF.ADDICTS;VAR SURVT;CLAS CLINIC;RUN; 

Notice that each SAS statement ends with a semicolon. If each 
procedure is submitted one at a time then each procedure 
must end with a RUN statement. Otherwise one RUN state¬ 
ment at the end of the last procedure is sufficient. With the 
LIBNAME statement, SAS recognizes a two-level file name: 
the reference name and the file name without an extension. 
For our example, the SAS file name is REF.ADDICTS. Alter¬ 
natively, a temporary SAS dataset could be created and used 
for these procedures. 
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Text that you do not wish SAS to process should be written as 
a comment: 

/* A comment begins with a forward slash followed 
by a star and ends with a star followed by a 
forward slash. */ 

*A comment can also be created by beginning with a 
star and ending with a semicolon; 

The survival analyses demonstrated in SAS are: 

1. Demonstrating PROC LIFETEST to obtain Kaplan-Meier 
and life table survival estimates (and plots); 

2. Running a Cox PH model with PROC PHREG; 

3. Running a stratified Cox model; 

4. Assessing the PH assumption with a statistical test; 

5. Obtaining Cox adjusted survival curves; 

6. Running an extended Cox model (i.e., containing time- 
varying covariates); 

7. Running parametric models with PROC LIFEREG; and 

8. Modeling recurrent events. 

1. DEMONSTRATING PROC LIFETEST TO OBTAIN 
KAPLAN-MEIER AND LIFE TABLE SURVIVAL 
ESTIMATES (AND PLOTS) 

PROC LIFETEST produces Kaplan-Meier survival estimates 
with the METHOD=KM option. The PLOTS=(S) option plots 
the estimated survival function. The TIME statement defines 
the time-to-event variable (SURVT) and the value for censor¬ 
ship (STATUS = 0). The code follows (output omitted). 

PROC LIFETEST DATA=REF.ADDICTS METH0D=KM PL0TS=(S); 
TIME SURVT*STATUS(0); 

RUN; 


Use a STRATA statement in PROC LIFETEST to compare sur¬ 
vival estimates for different groups (e.g., strata clinic). The 
PLOTS=(S,LLS) option produces log-log curves as well as sur¬ 
vival curves. If the proportional hazard assumption is met the 
log-log survival curves will be parallel. The STRATA statement 
also provides the log rank test and Wilcoxon test statistics. The 
code follows. 
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PROC LIFETEST DATA=REF.ADDICTS METH0D=KM PL0TS=(S,LLS); 
TIME SURVT*STATUS(0); 

STRATA CLINIC; 

RUN; 

PROC LIFETEST yields the following edited output. 

The LIFETEST Procedure (stratified) 

Stratum 1: CLINIC = 1 


Product-Limit Survival Estimates 





Survival 

Standard 

Number 

Number 

SURVT 

Survival 

Failure 

Error 

Failed 

Left 

0.00 

1.0000 

0 

0 

0 

163 

2.00* 




0 

162 

7.00 

0.9938 

0.00617 

0.00615 

1 

161 

17.00 

0.9877 

0.0123 

0.00868 

2 

160 


836.00 

0.0869 

0.9131 

0.0295 

118 

6 

837.00 

0.0725 

0.9275 

0.0279 

119 

5 


Stratum 2: CLINIC = 2 


Product-Limit Survival Estimates 


SURVT 

Survival 

Failure 

0.00 

1.0000 

0 

2.00* 



13.00 

0.9865 

0.0135 

26.00 

0.9730 

0.0270 


Survival 

Standard 

Number 

Number 

Error 

Failed 

Left 

0 

0 

75 


0 

74 

0.0134 

1 

73 

0.0189 

2 

72 


Test of Equality over Strata 


Test 


Chi-Square 


Pr > 

DF Chi-Square 


Log-Rank 27.8927 1 
Wilcoxon 11.6268 1 
-2Log(LR) 26.0236 1 


<.0001 

0.0007 

<.0001 
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Both the log rank and Wilcoxon test yield highly significant 
chi-square test statistics. The Wilcoxon test is a variation of 
the log rank test weighting the observed minus expected score 
of the jth failure time by n, (the number still at risk at the jth 
failure time). 

The requested log-log plots from PROC LIFETEST follow. 



SAS (as well as Stata) plots log(survival time) rather than sur¬ 
vival time on the horizontal axis by default for log-log curves. 
As far as checking the parallel assumption, it does not matter 
if log(survival time) or survival time is on the horizontal axis. 
However, if the log-log survival curves look like straight lines 
with log(survival time) on the horizontal axis, then there is 
evidence that the “time-to-event” variable follows a Weibull 
distribution. If the slope of the line equals one, then there is 
evidence that the survival time variable follows an exponen¬ 
tial distribution, a special case of the Weibull distribution. For 
these situations, a parametric survival model can be used. 

You can gain more control over what and how variables are 
plotted, by creating a dataset that contains the survival esti¬ 
mates. Use the OUTSURV = option in the PROC LIFETEST 
statement to create a SAS data containing the KM survival 
estimates. The option OUTSURV = DOG creates a dataset 
called dog (make up your own name) containing the survival 
estimates in a variable called SURVIVAL. The code follows. 

PROC LIFETEST DATA=REF.ADDICTS METH0D=KM 0UTSURV=D0G; 

TIME SURVT*STATUS(0); 

STRATA CLINIC; 

RUN; 
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Data dog contains the survival estimates but not the 
log(-(log)) of the survival estimates. Data cat is created in 
the following code from data dog (using the statement SET 
DOG) and defines a new log-log variable called LLS. 

DATA CAT; 

SET DOC; 

LLS=LOG(-LOG(SURVIVAL)) ; 

RUN; 

In SAS, the LOG function returns the natural log, not the log 
base 10. 

PROC PRINT prints the data in the output window. 

PROC PRINT DATA=CAT; RUN; 

The first 10 observations from PROC PRINT are listed below. 


Obs 

CLINIC 

SURVT 

-CENS0R- 

SURVIVAL 

LLS 

1 

1 

0 

0 

1.00000 


2 

1 

2 

1 

1.00000 


3 

1 

7 

0 

0.99383 

-5.08450 

4 

1 

17 

0 

0.98765 

-4.38824 

5 

1 

19 

0 

0.98148 

-3.97965 

6 

1 

28 

1 

0.98148 

-3.97965 

7 

1 

28 

1 

0.98148 

-3.97965 

8 

1 

29 

0 

0.97523 

-3.68561 

9 

1 

30 

0 

0.96898 

-3.45736 

10 

1 

33 

0 

0.96273 

-3.27056 


The PLOT LLS*SURVT=CLINIC statement plots the variable 
LLS (the log-log survival variables) on the vertical axis and 
SURVT on the horizontal axis, stratified by CLINIC. The SYM¬ 
BOL option can be used to choose plotting colors for each 
level of clinic. The code and output for plotting the log-log 
curves by CLINIC follow. 

SYMBOL C0L0R=BLUE; 

SYMB0L2 C0L0R=RED; 

PROC GPLOT DATA=CAT; 

PLOT LLS*SURVT=CLINIC; 

RUN; 
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ll s 



The plot has survival time (in days) rather than the default 
log(survival time). The log-log survival plots look parallel for 
CLINIC the first 365 days but then seem to diverge. This in¬ 
formation can be utilized when developing an approach to 
modeling CLINIC with a time-dependent variable in an ex¬ 
tended Cox model. 

You can also obtain survival estimates using life tables. This 
method is useful if you do not have individual level survival 
information but rather have group survival information for 
specified time intervals. The user determines the time inter¬ 
vals using the INTERVALS=option. The code follows (output 
omitted). 

PROC LIFETEST DATA=REF.ADDICTS 
METH0D=LT INTERVALS^ 50 100 150 
200 TO 1000 BY 100 PL0TS=(S); 

TIME SURVT*STATUS(0); 

RUN; 


2. RUNNING A COX PROPORTIONAL HAZARD 
MODEL WITH PROC PHREG 

PROC PHREG is used to request a Cox proportional hazards 
model. The code follows. 


PROC PHREG DATA=REF.ADDICTS; 

MODEL SURVT*STATUS(0)= PRISON DOSE CLINIC; 
RUN; 
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The code SURVT*STATUS(0) in the MODEL statement 
specifies the time-to-event variable (SURVT) and the value 
for censorship (STATUS = 0). Three predictors are included 
in the model: PRISON, DOSE, and CLINIC. The PH assump¬ 
tion is assumed to follow for each of these predictors (perhaps 
incorrectly). The output produced by PROC PHREG follows. 


The PHREC Procedure 
Model Information 

Data Set 

Dependent Variable 
Censoring Variable 
Censoring Value(s) 

Ties Handling 


REF.ADDICTS 

SURVT survival time in days 

STATUS status (0=censored, l=endpoint) 

0 

BRESLOW 


Summary of the Number of Event and Censored Values 

Percent 

Total Event Censored Censored 

238 150 88 36.97 

Model Fit Statistics 

Without With 

Criterion Covariates Covariates 

-2 LOG L 1411.324 1346.805 

AIC 1411.324 1352.805 

SBC 1411.324 1361.837 


Analysis of Maximum Likelihood Estimates 


Variable 

DF 

Parameter 

Estimate 

Standard 

Error 

Chi-Square 

Pr > ChiSq 

Hazard 

Ratio 

Variable Label 

PRISON 

1 

0.32650 

0.16722 

3.8123 

0.0509 

1.386 

0=none, Imprison 

DOSE 

1 

-0.03540 

0.00638 

30.7844 

<.0001 

0.965 

record 

methadone dose 

CLINIC 

1 

-1.00876 

0.21486 

22.0419 

<.0001 

0.365 

(mg/day) 

Coded 1 or 2 


The table above lists the parameter estimates for the regres¬ 
sion coefficients, their standard errors, a Wald chi-square test 
statistic for each predictor, and corresponding p-value. The 
column labeled HAZARD RATIO gives the estimated hazard 
ratio per one-unit change in each predictor by exponentiating 
the estimated regression coefficients. 
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You can use the TIES=EXACT option in the model statement 
rather than run the default TIES=BRESLOW option that was 
used in the previous model. The TIES=EXACT option is a 
computationally intensive method to handle events that occur 
at the same time. If many events occur simultaneously in the 
data then the TIES=EXACT option is preferred. Otherwise, 
the difference between this option and the default is slight. 
The option RL in the MODEL statement of PROC PHREG 
provides 95% confidence intervals for the hazard ratio esti¬ 
mates. 


PROC PHREG DATA=REF.ADDICTS; 

MODEL SURVT*STATUS(0)= PRISON DOSE CLINIC/TIES=EXACT RL; 
RUN; 


The output is shown below. 


The PHREC Procedure 
Model Information 

Data Set 

Dependent Variable 
Censoring Variable 
Censoring Value(s) 

Ties Handling 


REF.ADDICTS 

SURVT survival time in days 

STATUS status (0=censored, l=endpoint) 

0 

EXACT 


Analysis of Maximum Likelihood Estimates 


Variable 

DF 

Parameter 

Estimate 

Standard 

Error 

Chi-Square 

Pr > ChiSq 

Hazard 

Ratio 

95% Hazard 
Confidence 

Ratio 

Limits 

PRISON 

1 

0.32657 

0.16723 

3.8135 

0.0508 

1.386 

0.999 

1.924 

DOSE 

1 

-0.03537 

0.00638 

30.7432 

<.0001 

0.965 

0.953 

0.977 

CLINIC 

1 

-1.00980 

0.21488 

22.0832 

<.0001 

0.364 

0.239 

0.555 


The parameter estimates and their standard errors vary only 
slightly from the previous model without the TIES=EXACT 
option. Notice the type of ties handling approach is listed in 
the table called MODEL INFORMATION in the output. 

Suppose we wish to assess interaction between PRISON and 
CLINIC and between PRISON and DOSE. We can define two 
interaction terms in a new temporary SAS dataset (called ad- 
dicts2) and then run a model containing those terms. Prod¬ 
uct terms for CLINIC times PRISON (called CLIN PR) and 
CLINIC time DOSE (called CLIN DO) are defined in the fol¬ 
lowing data step. 
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DATA ADDICTS2; 

SET REF.ADDICTS; 
CLII\l-PR=CLINIC*PRISON; 
CLII\l-DO=CLINIC*DOSE; 
RUN; 


The interaction terms (called CLIN_PR and CLIN_DO) are 
added to the model. 

PROC PHREG DATA=ADDICTS2; 

MODEL SURVT*STATUS(0)=PRISON DOSE CLINIC CLIN.PR CLIN-DO; 
RUN; 


The PROC PHREG output follows. 


The PHREG Procedure 
Model Information 


Data Set 

Dependent Variable 
Censoring Variable 
Censoring Value(s) 
Ties Handling 


WORK.ADDICTS2 

SURVT survival time in days 

STATUS status (0=censored, l=endpoint) 

0 

BRESLOW 


Model Fit Statistics 


Criterion 

-2 LOG L 

AIC 

SBC 


Without 

Covariates 

1411.324 

1411.324 

1411.324 


With 

Covariates 

1343.199 

1353.199 
1368.253 


Analysis of Maximum Likelihood Estimates 


Variable 

DF 

Parameter 

Estimate 

Standard 

Error 

Chi-Square 

Pr > ChiSq 

Hazard 

Ratio 

PRISON 

1 

1.19200 

0.54137 

4.8480 

0.0277 

3.294 

DOSE 

1 

-0.01932 

0.01935 

0.9967 

0.3181 

0.981 

CLINIC 

1 

0.17469 

0.89312 

0.0383 

0.8449 

1.191 

CLIN.PR 

1 

-0.73799 

0.43149 

2.9253 

0.0872 

0.478 

CLIN-DO 

1 

-0.01386 

0.01433 

0.9359 

0.3333 

0.986 



518 Computer Appendix: Survival Analysis on the Computer 


SAS 


The estimates of the hazard ratios (left column) may be de¬ 
ceptive when product terms are in the model. For example, 
by exponentiating the estimated coefficient for PRISON at 
exp(l.19200) = 3.284, we obtain the estimated hazard ratio 
for PRISON = 1 vs. PRISON = 0 where DOSE = 0 and 
CLINIC = 0. This is a meaningless hazard ratio because 
CLINIC is coded 1 or 2 and DOSE is always greater than zero 
(all patients are on methadone). 

The Wald chi-square p-values for the two product terms are 
0.0872 for CLIN_PR and 0.3333 for CLIN_DO. Alternatively, 
a likelihood ratio test can simultaneously test both product 
terms by subtracting the -2 log likelihood statistic for the full 
model (with the two product terms) from the reduced model 
(without the product terms). The —2 log likelihood statistic 
can be found on the output in the table called MODEL FIT 
STATISTICS and under the column called WITH COVARI¬ 
ATES. The —2 log likelihood statistic is 1343.199 for the full 
model and 1346.805 for the reduced model. The test is a 2- 
degree of freedom test because 2 product terms are simulta¬ 
neously tested. 

The PROBCHI function in SAS can be used to obtain p-values 
for chi-square tests. The code follows. 

DATA TEST; 

REDUCED = 1346.805; 

FULL = 1343.199; 

DF = 2; 

P.VALUE = 1 - PROBCHI(REDUCED-FULL,DF); 

RUN; 

PROC PRINT DATA=TEST;RUN; 

Note that you must write 1 minus the PROBCHI function to 
obtain the area under the right side of the chi-square prob¬ 
ability density function. The output from the PROC PRINT 
follows. 


Obs 

REDUCED 

FULL 

DF 

P-VALUE 

1 

1346.81 

1343.20 

2 

0.16480 


The p-value for the likelihood ratio test for both product terms 
is 0.16480. 
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3. RUNNING A STRATIFIED COX MODEL 

Suppose we believe that the variable CLINIC violates the pro¬ 
portional hazards assumption but the variables PRISON and 
DOSE follow the PH assumption within each level of CLINIC. 
A stratified Cox model on the variable CLINIC can be run with 
PROC PHREG using the STRATA CLINIC statement. 

PROC PHREG DATA=REF.ADDICTS; 

MODEL SURVT*STATUS(0)=PRIS0N DOSE; 
STRATA CLINIC; 

RUN; 


The output of the parameter estimates follows. 

The PHREG Procedure 

Analysis of Maximum Likelihood Estimates 




Parameter 

Standard 



Hazard 

Variable 

DF 

Estimate 

Error 

Chi-Square 

Pr > ChiSq 

Ratio 

PRISON 

1 

0.38877 

0.16892 

5.2974 

0.0214 

1.475 

DOSE 

1 

-0.03514 

0.00647 

29.5471 

<.0001 

0.965 


Notice there is no parameter estimate for CLINIC be¬ 
cause CLINIC is the stratified variable. The hazard ratio for 
PRISON = 1 vs. PRISON = 0 is estimated at 1.475. This haz¬ 
ard ratio is assumed not to depend on CLINIC because an 
interaction term between PRISON and CLINIC was not in¬ 
cluded in the model. 

Suppose we wish to assess interaction between PRISON and 
CLINIC as well as DOSE and CLINIC in a Cox model stratified 
by CLINIC. We can define interaction terms in a new SAS 
dataset (called addicts2) and then run a model containing 
these terms. Notice that when we stratify by CLINIC we do 
not put the variable CLINIC in the model statement. However, 
the interaction terms CLIN.PR and CLIN_DO are put in the 
model statement and CLINIC is put in the strata statement. 

DATA ADDICTS2; 

SET REF.ADDICTS; 

CLIN-PR=CLINIC*PRISON; 

CLIN-DO=CLINIC*DOSE; 

RUN; 
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Variable 

PRISON 

DOSE 

CLIN.PR 

CLIN.DO 


PROC PHREG DATA=ADDICTS2; 

MODEL SURVT*STATUS(0)=PRISON DOSE CLIN-PR CLIN-DO; 
STRATA CLINIC; 

RUN; 


The output of the parameter estimates follows. 


The PHREG Procedure 

Analysis of Maximum Likelihood Estimates 


Parameter Standard 

DF Estimate Error Chi-Square Pr > ChiSq 


Hazard 

Ratio 


1 1.08716 0 
1 -0.03482 0 
1 -0.58467 0 
1 -0.00105 0 


53861 4.0741 
01980 3.0930 
42813 1.8650 
01457 0.0052 


0.0435 

2.966 

0.0786 

0.966 

0.1721 

0.557 

0.9427 

0.999 


Note with the interaction model that the hazard ratio for 
PRISON = 1 vs. PRISON = 0 for CLINIC = 1 controlling for 
DOSE is exp((3[ + |3 3 ), and the hazard ratio for PRISON = 
1 vs. PRISON = 0 for CLINIC = 2 controlling for DOSE is 
expfp! + 2|3 3 ). This latter calculation is obtained by substi¬ 
tuting the appropriate values into the hazard in both the nu¬ 
merator (for PRISON = 1) and denominator (for PRISON = 
0) (see below). 


h 0 (t) ex p[iPj + |3 2 DOS£ + (2)(1)|3 3 + fi 4 CLINJ)0] 

~ /2o(Oexp[0|3, + |3 2 DOS£ + (2)(0)|3 3 + fi 4 CLINJ)0] 

= exp(|3 1 + 2|3 3 ) 

By substituting in the parameter estimates we obtain 
an estimated hazard ratio for PRISON at exp(l.08716 + 
2(—0.58467) = 0.921 among those at CLINIC = 2. 

An alternative approach allowing for interaction with CLINIC 
and the other covariates is obtained by running two models: 
one subset on the observations where CLINIC = 1 and the 
other subset on the observations where CLINIC = 2. The code 
and output follow. 


PROC PHREG DATA=ADDICTS2; 

MODEL SURVT*STATUS(0)=PRISON DOSE; 

WHERE CLINIC=1; 

TITLE COX MODEL RUN ONLY ON DATA WHERE CLINIC=1; 
RUN; 
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Variable 

PRISON 

DOSE 


Variable DF 
PRISON 1 
DOSE 1 


A WHERE statement in a SAS procedure limits the num¬ 
ber of observations for analyses. A TITLE statement can 
also be added to the procedure. The output containing 
the parameter estimates subset on the observations where 
CLINIC = 1 follows. 

COX MODEL RUN ONLY ON DATA WHERE CLINIC=1 
Analysis of Maximum Likelihood Estimates 

Parameter Standard Hazard 

DF Estimate Error Chi-Square Pr > ChiSq Ratio 

1 0.50249 0.18869 7.0918 0.0077 1.653 

1 -0.03586 0.00774 21.4761 <.0001 0.965 


Similarly, the code and output follow containing the param¬ 
eter estimates subset on the observations where CLINIC = 2. 


PR0C PHREG DATA=ADDICTS2; 

MODEL SURVT*STATUS(0)=PRISON DOSE; 

WHERE CLINIC=2; 

TITLE COX MODEL RUN ONLY ON DATA WHERE CLINIC=2; 
RUN; 


COX MODEL RUN ONLY ON DATA WHERE CLINIC=2 
Analysis of Maximum Likelihood Estimates 


Parameter 

Standard 



Hazard 


Estimate 

Error 

Chi-Square 

Pr > ChiSq 

Ratio 

Variable Label 

-0.08226 

0.38430 

0.0458 

0.8305 

0.921 

0=none, Imprison 
record 

-0.03693 

0.01234 

8.9500 

0.0028 

0.964 

methadone dose 
(mg/day) 


The estimated hazard ratio for PRISON = 1 vs. PRISON = 0 
is 0.921 among CLINIC = 2 controlling for DOSE. This result 
is consistent with the stratified Cox model previously run in 
which all the product terms with CLINIC were included in 
the model. 
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4. ASSESSING THE PH ASSUMPTION WITH 
A STATISTICAL TEST 

The following SAS program makes use of the addicts dataset 
to demonstrate how a statistical test of the proportional haz¬ 
ard assumption is performed for a given covariate (Harrel and 
Lee, 1986). This is accomplished by finding the correlation be¬ 
tween the Schoenfeld residuals for a particular covariate and 
the ranking of individual failure times. If the proportional 
hazard assumption is met then the correlation should be near 
zero. 

The p-value for testing this correlation can be obtained from 
PROC CORR (or PROC REG). The Schoenfeld residuals for 
a given model can be saved in a SAS dataset using PROC 
PHREG. The ranking of events by failure time can be saved 
in a SAS dataset using PROC RANKED. The null hypothesis 
is that the PH assumption is not violated. 

First run the full model. The output statement creates a 
SAS dataset, the OUT=option defines an output dataset, and 
the RESSCH=statement is followed by user-defined variable 
names so that the output dataset contains the Schoenfeld 
residuals. The order of the names corresponds to the order 
of the independent variables in the model statement. The ac¬ 
tual variable names are arbitrary. The name we chose for the 
dataset is RESID and the names we chose for the variables 
containing the Schoenfeld residuals for CLINIC, PRISON, 
and DOSE are RCLINIC, RPRISON, and RDOSE. The code 
follows. 

PROC PHREG DATA=REF.ADDICTS; 

MODEL SURVT*STATUS(0) = C LINIC PRISON DOSE; 

OUTPUT 0UT=RESID RESSCH=RCLINIC RPRISON RDOSE; 

RUN; 


PROC PRINT DATA=RESID;RUN; 
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The first 10 observations of the PROC PRINT are shown be¬ 
low. The three columns on the right are the variables contain¬ 
ing the Schoenfeld residuals. 


Obs 

SURVT 

STATUS 

CLINIC 

PRISON 

DOSE 

RCLINIC 

RPRIS0N 

RD0SE 

1 

428 

1 

1 

0 

50 

-0.18715 

-0.40641 

-8.2100 

2 

275 

1 

1 

1 

55 

-0.15841 

0.55485 

-2.6277 

3 

262 

1 

1 

0 

55 

-0.16453 

-0.45197 

-2.5635 

4 

183 

1 

1 

0 

30 

-0.14577 

-0.48727 

-26.0823 

5 

259 

1 

1 

1 

65 

-0.16306 

0.54313 

7.3701 

6 

714 

1 

1 

0 

55 

-0.25853 

-0.50074 

-8.5347 

7 

438 

1 

1 

1 

65 

-0.19292 

0.58106 

6.6072 

8 

796 

0 

1 

1 

60 




9 

892 

1 

1 

0 

50 

-0.34478 

-0.22372 

-15.9088 

10 

393 

1 

1 

1 

65 

-0.17712 

0.57376 

6.6886 


Next, create a SAS dataset that deletes censored observations 
(i.e., only contains observations that fail). 

DATA EVENTS; 

SET RESID; 

IF STATUS=1; 

RUN; 

Use PROC RANK to create a dataset containing a variable 
that ranks the order of failure times. The user supplies the 
name of the output data set using the OUT= option. The vari¬ 
able to be ranked is SURVT (the survival time variable). The 
RANKS statement precedes a user-defined variable name for 
the rankings of failure times. The user-defined names are ar¬ 
bitrary. The name we chose for this variable is TIMERANK. 
The code follows. 

PROC RANK DATA=EVENTS OUTRANKED TIES=MEAN; 
VAR SURVT; 

RANKS TIMERANK; 

RUN; 

PROC PRINT DATA=RANKED;RUN; 
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PROC CORR is used to get the correlations between the 
ranked failure time variable (called TIMERANK in this exam¬ 
ple) and the variables containing the Schoenfeld residuals of 
CLINIC, PRISON, AND DOSE (called RCLINIC, RPRISON, 
AND RDOSE respectively, in this example). The NOSIMPLE 
option suppresses the printing of summary statistics. If the 
proportional hazard assumption is met for a particular co¬ 
variate, then the correlation should be near zero. The p-value 
obtained from PROC CORR which tests whether this correla¬ 
tion is zero is the p-value for testing the proportional hazard 
assumption. The code follows. 

PROC CORR DATA=RANKED NOSIMPLE; 

VAR RCLINIC RPRISON RDOSE; 

WITH TIMERANK; 

RUN; 

The PROC CORR output follows. 

The CORR Procedure 

Pearson Correlation Coefficients, N = 150 
Prob > |r| under HO: Rho=0 

RCLINIC RPRISON RDOSE 

TIMERANK -0.26153 -0.07970 0.07733 

Rank for Variable SURVT 0.0012 0.3323 0.3469 

The sample correlations with their corresponding p-values 
printed underneath are shown above. The p-values for 
CLINIC, PRISON, and DOSE are 0.0012, 0.3323, and 0.3469, 
respectively, suggesting that the PH assumption is violated for 
CLINIC, but reasonable for PRISON and DOSE. 

The same p-values can be obtained by running linear regres¬ 
sions with each predictor (one at a time) using PROC REG 
and examining the p-values for the regression coefficients. 
The code below will produce output containing the p-value 
for CLINIC. 

PROC REG DATA=RANKED; 

MODEL TIMERANK=RCLINIC; 

RUN; 
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The output produced by PROC REG follows. 

The REG Procedure 


Parameter Estimates 


Variable 

Parameter 

Estimate 

Standard 

Error 

t Value 

Pr > |t| 

Intercept 

75.49955 

3.43535 

21.98 

<.0001 

RCLINIC 

-28.38848 

8.61194 

-3.30 

0.0012 


The p-value for CLINIC (0.0012) is shown in the column on 
the right and matches the p-value that was obtained using 
PROC CORR. 


5. OBTAINING COX ADJUSTED SURVIVAL CURVES 

We can use the BASELINE statement in PROC PHREG to 
create an output dataset containing Cox adjusted survival es¬ 
timates for a specified pattern of covariates. The particular 
pattern of covariates of interest must first be created in a SAS 
dataset that is subsequently used as the input dataset for the 
COVARIATES=option in the BASELINE statement of PROC 
PHREG. Each pattern of covariates yields a different sur¬ 
vival curve (assuming nonzero effects). Adjusted log(— log) 
survival plots can also be obtained for assessing the PH as¬ 
sumption. This is illustrated with three examples. 


Exl—Run a PH model using PRISON, DOSE, and CLINIC and 
obtain adjusted survival curves where PRISON = 0, DOSE = 
70, and CLINIC = 2. 

Ex2—Run a stratified Cox model (by CLINIC). Obtain two ad¬ 
justed survival curves using the mean value of PRISON and 
DOSE for CLINIC = 1 and CLINIC = 2. Use the log-log curves 
to assess the PH assumption for CLINIC adjusted for PRISON 
and DOSE. 

Ex3—Run a stratified Cox model (by CLINIC) and obtain ad¬ 
justed survival curves for PRISON = 0, DOSE = 70, and for 
PRISON = 1, DOSE = 70. This yields four survival curves in 
all, two for CLINIC = 1 and two for CLINIC = 2. 
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Basically there are these steps: 

1. Create the input dataset containing the pattern (values) of 
covariates used for the adjusted survival curves. 

2. Run a Cox model with PROC PHREG using the BASELINE 
statement to input the dataset from Step (1) and output a 
dataset containing the adjusted survival estimates. 

3. Plot the adjusted survival estimates from the output dataset 
created in Step (2). 

For Exl we create an input dataset (called INI) with one ob¬ 
servation where PRISON = 0, DOSE = 70, and CLINIC = 
2. We then run a model and create an output dataset (called 
OUT1) containing a variable with the adjusted survival esti¬ 
mates (called SI). Finally, the adjusted survival curve is plot¬ 
ted using PROC GPLOT. The code follows. 


DATA INI; 

INPUT PRISON DOSE CLINIC; 
CARDS; 

0 70 2 


PROC PHREG DATA=REF.ADDICTS; 

MODEL SURVT*STATUS(0)=PRISON DOSE CLINIC; 

BASELINE C0VARIATES=IN1 0UT=0UT1 SURVIVAL=S1/N0MEAN; 

RUN; 

PROC GPLOT DATA=0UT1; 

PLOT S1*SURVT; 

TITLE Adjusted survival for prison=0, dose=70, clinic=2; 
RUN; 


The BASELINE statement in PROC PHREG specifies the in¬ 
put dataset, the output dataset, and the name of the variable 
containing the adjusted survival estimates. The NOMEAN op¬ 
tion suppresses the survival estimates using the mean values 
of PRISON, DOSE, and CLINIC. The next example (Ex2) does 
not use the NOMEAN option. 
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The output for PROC GPLOT follows. 


s(t) 

1.0 

0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

0 100 200 300 400 500 600 700 800 900 

survival time in days 

Adjusted survival for prison = 0, dose = 70, clinic = 2 



For Ex2 we wish to create and output a dataset (called OUT2) 
that contains the adjusted survival estimates from a Cox 
model stratified by CLINIC using the mean values of PRISON 
and DOSE An input dataset need not be specified because by 
default the mean values of PRISON and DOSE will be used if 
the NOMEAN option is not used in the BASELINE statement. 
The code follows. 

PROC PHREG DATA=REF.ADDICTS; 

MODEL SURVT*STATUS(0) = PRISON DOSE CLINIC; 

BASELINE 0UT=0UT2 SURVIVAL=S2 L0GL0GS=LS2; 

RUN; 

PROC GPLOT DATA=0UT2; 

PLOT S2*SURVT=CLINIC; 

TITLE adjusted survival stratified by clinic; 

RUN; 

PROC GPLOT DATA=0UT2; 

PLOT LS2*SURVT=CLINIC; 

TITLE log-log curves stratified by clinic, adjusted for 
prison, dose; 

RUN; 



Log of Negative Log of SURVIVAL 
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The code PLOT LS2*SURVT=CLINIC in the second PROC 
GPLOT will plot LS2 on the vertical axis, SURVT on the hori¬ 
zontal axis, stratified by CLINIC on the same graph. The vari¬ 
able LS2 was created in the BASELINE statement of PROC 
PHREG and contains the adjusted log-log survival estimates. 
The PROC GPLOT output for the log-log survival curves strat¬ 
ified by CLINIC adjusted for PRISON and DOSE follows. 



Log-log curves stratified by clinic, adjusted for prison, dose 


The adjusted log-log plots look similar to the unadjusted log- 
log Kaplan-Meier plots shown earlier in that the plots look 
reasonably parallel before 365 days but then diverge suggest¬ 
ing that the PH assumption is violated after 1 year. 

For Ex3, a stratified Cox (by CLINIC) is run and adjusted 
curves are obtained for PRISON = 1 and PRISON = 0 holding 
DOSE = 70. An input dataset (called IN3) is created with two 
observations for both levels of PRISON with DOSE = 70. An 
output dataset (called OUT3) is created with the BASELINE 
statement that contains a variable (called S3) of survival es¬ 
timates for all four curves (two for each stratum of CLINIC). 
The code follows. 
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DATA IN3; 

INPUT PRISON DOSE; 
CARDS; 

1 70 
0 70 


PROC PHREG DATA=REF.ADDICTS; 

MODEL SURVT*STATUS(0)= PRISON DOSE; 

STRATA CLINIC; 

BASELINE C0VARIATES=IN3 0UT=0UT4 SURVIVAL=S3/N0MEAN; 

RUN; 

PROC GPLOT DATA=0UT3; 

PLOT S3*SURVT=CLINIC; 

TITLE adjusted survival stratified by clinic for both levels 
of prison; 

RUN; 

The PROC GPLOT output follows. 



Adjusted survival stratified by clinic for both levels of prison 


For the above graph, the PH assumption is not assumed for 
CLINIC because that is the stratified variable. However, the 
PH assumption is assumed for PRISON within each stratum 
of CLINIC (i.e., CLINIC = 1 and CLINIC = 2). 
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6. RUNNING AN EXTENDED COX MODEL 

Models containing time-dependent variables are run using 
PROC PHREG. Time-dependent variables are created with 
programming statements within the PROC PHREG proce¬ 
dure. Sometimes users incorrectly define time-dependent 
variables in the data step. This leads to wrong estimates be¬ 
cause the time variable used in the data step (SURVT) is ac¬ 
tually time-independent and therefore different than the time 
variable (also called SURVT) used to define time-dependent 
variables in the PROC PHREG statement. See the discussion 
on the extended Cox likelihood in Chapter 6 for further clari¬ 
fication of this issue. 

We have evaluated the PH assumption for the variable CLINIC 
by plotting KM log-log curves and Cox adjusted log-log curves 
stratified by CLINIC and checking whether the curves were 
parallel. We could do similar analyses with the variables 
PRISON and DOSE although with DOSE we would need to 
categorize the continuous variable before comparing plots for 
different strata of DOSE. 

If it is expected that the hazard ratio for the effect of DOSE 
increases (or decreases) monotonically with time we could 
add a continuous time-varying product term with DOSE and 
some function of time. The model defined below contains a 
time-varying variable (LOGTDOSE) defined as the product of 
DOSE and the natural log of time (SURVT). In some sense a 
violation of the proportional hazard assumption for a partic¬ 
ular variable means that there is an interaction between that 
variable and time. Note that the variable LOGTDOSE is de¬ 
fined within the PHREG procedure and not in the data step. 
The code follows. 

PROC PHREG DATA=REF.ADDICTS; 

MODEL SURVT*STATUS(0)=PRISON CLINIC DOSE LOGTDOSE; 

LOGTDOSE=DOSE* LOG(SURVT); 

RUN; 
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The output produced by PROC PHREG follows. 

The PHREG Procedure 





Analysis 

of Maximum 

Likelihood Estimates 



Parameter 

Standard 



Hazard 

Variable 

DF 

Estimate 

Error 

Chi-Square 

Pr > ChiSq 

Ratio 

PRISON 

1 

0.34047 

0.16747 

4.1333 

0.0420 

1.406 

CLINIC 

1 

-1.01857 

0.21538 

22.3655 

<.0001 

0.361 

DOSE 

1 

-0.08243 

0.03599 

5.2468 

0.0220 

0.921 

LOGTDOSE 

1 

0.00858 

0.00646 

1.7646 

0.1841 

1.009 


The Wald test for the time-dependent variable LOGTDOSE 
yields a p-value of 0.1841. A nonsignificant p-value does not 
necessarily mean that the PH assumption is reasonable for 
DOSE. Perhaps a different defined time-dependent variable 
would have been significant (e.g., DOSE x (TIME — 100)). 
Also the sample size of the study is a key determinant of the 
power to reject the null, which in this case means rejection of 
the PH assumption. 

Next we consider time-dependent variables for CLINIC. The 
next two models use Heaviside functions that allow a different 
hazard ratio for CLINIC before and after 365 days. The first 
model uses two Heaviside functions in the model (HV1 and 
HV2) but not CLINIC. The second model uses one Heaviside 
function (HV) but also includes CLINIC in the model. These 
two models yield the same hazard ratio estimates for CLINIC 
but are coded differently. The code follows. 


PROC PHREG DATA=REF.ADDICTS; 

MODEL SURVT*STATUS(0)=PRISON DOSE HV1 HV2; 

IF SURVT < 365 THEN HV1 = CLINIC; ELSE HV1 - 0; 
IF SURVT >= 365 THEN HV2 = CLINIC; ELSE HV2 = 0; 
RUN; 

PROC PHREG DATA=REF.ADDICTS; 

MODEL SURVT*STATUS(0)=CLINIC PRISON DOSE HV; 

IF SURVT >= 365 THEN HV = CLINIC; ELSE HV = 0; 

RUN; 
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The output for the model with two Heaviside functions 
follows. 


The PHREC Procedure 





Analysis 

of Maximum 

Likelihood Estimates 



Parameter 

Standard 



Hazard 

Variable 

DF 

Estimate 

Error 

Chi-Square 

Pr > ChiSq 

Ratio 

PRISON 

1 

0.37770 

0.16840 

5.0304 

0.0249 

1.459 

DOSE 

1 

-0.03551 

0.00644 

30.4503 

<.0001 

0.965 

HV1 

1 

-0.45956 

0.25529 

3.2405 

0.0718 

0.632 

HV2 

1 

-1.82823 

0.38595 

22.4392 

<.0001 

0.161 


The parameter estimates for HV1 and HV2 can be used di¬ 
rectly to obtain the estimated hazard ratio for CLINIC = 2 
vs. CLINIC = 1 before and after 365 days. The estimated haz¬ 
ard ratio for CLINIC at 100 days is exp(-0.45956) = 0.632 
and the estimated hazard ratio for CLINIC at 400 days is 
exp(—1.82823) = 0.161. 

The output for the model with one Heaviside function follows. 

The PHREC Procedure 





Analysis 

of Maximum 

Likelihood Estimates 



Parameter 

Standard 



Hazard 

Variable 

DF 

Estimate 

Error 

Chi-Square 

Pr > ChiSq 

Ratio 

CLINIC 

1 

-0.45956 

0.25529 

3.2405 

0.0718 

0.632 

PRISON 

1 

0.37770 

0.16840 

5.0304 

0.0249 

1.459 

DOSE 

1 

-0.03551 

0.00644 

30.4503 

<.0001 

0.965 

HV 

1 

-1.36866 

0.46139 

8.7993 

0.0030 

0.254 


Notice the variable CLINIC is included in this model and 
the coefficient for the time-dependent Heaviside function 
HV does not contribute to the estimated hazard ratio un¬ 
til day 365. The estimated hazard ratio for CLINIC at 100 
days is exp(-0.45956) = 0.632 and the estimated hazard ratio 
for CLINIC at 400 days is exp((-0.45956) + (—1.36866)) = 
0.161. These results are consistent with the estimates ob¬ 
tained from the model with two Heaviside functions. A Wald 
test for the variable HV shows a statistically significant p- 
value of 0.003 suggesting a violation of the PH assumption 
for CLINIC. 
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Suppose it is believed that the hazard ratio for CLINIC = 2 vs. 
CLINIC = 1 is constant over the first year but then monoton- 
ically increases (or decreases) after the first year. The follow¬ 
ing code defines a model allowing for a time-varying covariate 
called CLINTIME (defined in the code) which contributes to 
the hazard ratio for CLINIC after 365 days. The code follows 
(output omitted). 

PROC PHREG DATA=REF.ADDICTS; 

MODEL SURVT*STATUS(0)=CLINIC PRISON DOSE CLINTIME; 

IF SURVT < 365 THEN CLINTIME=0; 

ELSE IF SURVT >= 365 THEN CLINTIME = CLINIC*(SURVT-365); 

RUN; 


7. RUNNING PARAMETRIC MODELS 
WITH PROC LIFEREG 

PROC LIFEREG runs parametric accelerated failure time 
(AFT) models rather than proportional hazards (PH) mod¬ 
els. Whereas the key assumption of a PH model is that haz¬ 
ard ratios are constant over time, the key assumption for an 
AFT model is that survival time accelerates (or decelerates) 
by a constant factor when comparing different levels of 
covariates. 

The most common distribution for parametric modeling of 
survival data is the Weibull distribution. The hazard function 
for a Weibull distribution is Apt p_1 . If p = 1 then the Weibull 
distribution is also an exponential distribution. The Weibull 
distribution has a desirable property in that if the AFT as¬ 
sumption holds then the PH assumption also holds. The ex¬ 
ponential distribution is a special case of the Weibull distri¬ 
bution. The key property for the exponential distribution is 
that the hazard is constant over time (h(t) = A). In SAS, the 
Weibull and exponential model are run only as AFT models. 

The Weibull distribution has the property that the log-log 
of the survival function is linear with the log of time. PROC 
LIFETEST can be used to plot Kaplan-Meier log-log curves 
against the log of time. If the curves are approximately 
straight lines (and parallel) then the assumption is reason¬ 
able. Furthermore, if the straight lines have a slope of 1, then 
the exponential distribution is appropriate. The code below 
produces log-log curves stratified by CLINIC and PRISON 
that can be used to check the validity of the Weibull assump¬ 
tion for those variables. 
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PROC LIFETEST DATA=REF.ADDICTS METH0D=KM PL0TS=(LLS); 
TIME SURVT*STATUS(0); 

STRATA CLINIC PRISON; 

RUN; 

The log-log survival plots produced by PROC LIFETEST 
follow. 



STRATA H—t—H CLINIC = 1 PRISON = 0 -t—t —h CLINIC = 1 PRISON = 1 

i—i—■ CLINIC = 2 PRISON = 0 -t-t-+ CLINIC = 2 PRISON = 1 


The log-log curves do not look straight but for illustration we 
proceed as if the Weibull assumption were appropriate. First 
an exponential model is run with PROC LIFEREG. In this 
model the Weibull shape parameter (p) is forced to equal 1, 
which forces the hazard to be constant. 

PROC LIFEREG DATA=REF.ADDICTS; 

MODEL SURVT*STATUS(0)=PRISON DOSE CLINIC/DIST=EXPONENTIAL; 

RUN; 

The DIST=EXPONENTIAL option in the MODEL statement 
requests the exponential distribution. The output of parame¬ 
ter estimates obtained from PROC LIFEREG follows. 


Analysis of Parameter Estimates 





Standard 

95% Confidence 

Chi- 


Parameter 

DF 

Estimate 

Error 

Limits 

Square 

Pr > ChiSq 

Intercept 

1 

3.6843 

0.4307 

2.8402 

4.5285 

73.17 

<.0001 

PRISON 

1 

-0.2526 

0.1649 

-0.5758 

0.0705 

2.35 

0.1255 

DOSE 

1 

0.0289 

0.0061 

0.0169 

0.0410 

22.15 

<.0001 

CLINIC 

1 

0.8806 

0.2106 

0.4678 

1.2934 

17.48 

<.0001 

Scale 

0 

1.0000 

0.0000 

1.0000 

1.0000 



Weibull Shape 

0 

1.0000 

0.0000 

1.0000 

1.0000 
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The exponential model assumes a constant hazard. This is in¬ 
dicated in the output by the value of the Weibull shape pa¬ 
rameter (1.0000). The output can be used to calculate the 
estimated hazard for any subject given a pattern of covari¬ 
ates. For example, a subject with PRISON = 0, DOSE = 50, 
and CLINIC = 2 has an estimated hazard of exp(-(3.6843 + 
50(0.0289) + 2(0.8806)) = .001. Note that SAS gives the pa¬ 
rameter estimates for the AFT form of the exponential model. 
Multiply the estimated coefficients by negative one to get esti¬ 
mates consistent with the PH parameterization of the model 
(see Chapter 7). 

Next a Weibull AFT model is run with PROC LIFEREG. 

PROC LIFEREG DATA=REF.ADDICTS; 

MODEL SURVT*STATUS(0)=PRISON DOSE CLINIC/DIST=WEIBULL; 
RUN; 

The DIST=WEIBULL option in the MODEL statement re¬ 
quests the Weibull distribution. The output for the parameter 
estimates follows. 


Analysis of Parameter Estimates 





Standard 

95% Confidence 

Chi- 


Parameter 

DF 

Estimate 

Error 

Limits 

Square 

Pr > ChiSq 

Intercept 

1 

4.1048 

0.3281 

3.4619 

4.7478 

156.56 

<.0001 

PRISON 

1 

-0.2295 

0.1208 

-0.4662 

0.0073 

3.61 

0.0575 

DOSE 

1 

0.0244 

0.0046 

0.0154 

0.0334 

28.32 

<.0001 

CLINIC 

1 

0.7090 

0.1572 

0.4009 

1.0172 

20.34 

<.0001 

Scale 

1 

0.7298 

0.0493 

0.6393 

0.8332 



Weibull Shape 

1 

1.3702 

0.0926 

1.2003 

1.5642 




The Weibull shape parameter is estimated at 1.3702. SAS calls 
the reciprocal of the Weibull shape parameter, the Scale pa¬ 
rameter, estimated at 0.7298. The acceleration factor compar¬ 
ing CLINIC = 2 to CLINIC = 1 is estimated at exp(0.7090) = 
2.03. So the estimated median survival time (time off heroin) 
is double for patients enrolled in CLINIC = 2 compared to 
CLINIC = 1. 

To obtain the hazard ratio parameters from the Weibull AFT 
model, multiply the Weibull shape parameter by the negative 
of the AFT parameter (see Chapter 7). For example, the HR 
estimate for CLINIC = 2 vs. CLINIC = 1 controlling for the 
other covariates is exp(1.3702(-0.7090)) = 0.38. 
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Next a log-logistic AFT model is run with PROC LIFEREG. 

PROC LIFEREG DATA-REF.ADDICTS; 

MODEL SURVT*STATUS(0)=PRISON DOSE CLINIC/DIST=LLOGISTIC; 
RUN; 

The output of the log-logistic parameter estimates follows. 


Analysis of Parameter Estimates 





Standard 

95% Confidence 

Chi- 


Parameter 

DF 

Estimate 

Error 

Limits 

Square 

Pr >ChiSq 

Intercept 

1 

3.5633 

0.3894 

2.8000 

4.3266 

83.71 

<.0001 

PRISON 

1 

-0.2913 

0.1440 

-0.5734 

-0.0091 

4.09 

0.0431 

DOSE 

1 

0.0316 

0.0055 

0.0208 

0.0424 

32.81 

<.0001 

CLINIC 

1 

0.5806 

0.1716 

0.2443 

0.9169 

11.45 

0.0007 

Scale 

1 

0.5868 

0.0403 

0.5129 

0.6712 




From this output, the acceleration factor comparing 
CLINIC = 2 to CLINIC = 1 is estimated as exp(0.5806) = 
1.79. If the AFT assumption holds for a log-logistic model, 
then the proportional odds assumption holds for the survival 
function (although the PH assumption will not hold). The pro¬ 
portional odds assumption can be evaluated by plotting the 
log odds of survival (using KM estimates) against the log of 
survival time. If the plots are straight lines for each pattern of 
covariates then the log-logistic distribution is reasonable. If 
the straight lines are also parallel then the proportional odds 
and AFT assumptions also hold. 

A SAS dataset containing the KM survival estimates can be 
created using PROC LIFETEST (see Section 1 of this ap¬ 
pendix). Once this variable is created, a dataset containing 
variables for the estimated log odds of survival and the log of 
survival time can also be created. PROC GPLOT can then be 
used to plot the log odds of survival against survival time. 

Another context for thinking about the proportional odds as¬ 
sumption is that the odds ratio estimated by a logistic re¬ 
gression does not depend on the length of the follow-up. For 
example, if a follow-up study were extended from three to five 
years then the underlying odds ratio comparing two patterns 
of covariates would not change. If the proportional odds as¬ 
sumption is not true, then the odds ratio is specific to the 
length of follow-up. 
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An AFT model is a multiplicative model with respect to sur¬ 
vival time or equivalently an additive model with respect to the 
log of time. In the previous example, the median survival time 
was estimated as 1.79 times longer for CLINIC = 2 compared 
to CLINIC = 1. In that example survival time was assumed 
to follow a log-logistic distribution or equivalently the log of 
survival time was assumed to follow a logistic distribution. 

SAS allows additive failure time models to be run (see 
Chapter 7 under the heading “Other Parametric Models”). The 
NOLOG option in the MODEL statement of PROC LIFEREG 
suppresses the default log link function which means that 
time, rather than log(time), is modeled as a linear function 
of the regression parameters. The following code requests an 
additive failure time model in which time follows a logistic 
(not log-logistic) distribution. 

PROC LIFEREG DATA=REF.ADDICTS; 

MODEL SURVT*STATUS(0)=PRIS0N DOSE CLINIC/DIST=LLOGISTIC N0L0G; 

RUN; 

Even though the option DIST=LLOGISTIC appears to re¬ 
quest that survival time follow a log-logistic distribution, the 
NOLOG option actually means that survival time is assumed 
to follow a logistic distribution. (Note that the NOLOG op¬ 
tion in Stata means something completely different using the 
streg command: that the iteration log file not be shown in 
the output.) The output from the additive failure time model 
follows. 


Analysis of Parameter Estimates 


Parameter 

DF 

Estimate 

Standard 

Error 

95% Confidence 
Limits 

Chi- 

Square 

Pr > ChiSq 

Intercept 

1 

-358.482 

114.0161 

-581.949 

-135.014 

9.89 

0.0017 

PRISON 

1 

-89.7816 

42.9645 

-173.990 

-5.5727 

4.37 

0.0366 

DOSE 

1 

10.3893 

1.6244 

7.2055 

13.5731 

40.91 

<.0001 

CLINIC 

1 

214.2525 

53.1204 

110.1385 

318.3665 

16.27 

<.0001 

Scale 

1 

172.4039 

11.3817 

151.4792 

196.2191 
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The parameter estimate for CLINIC is 214.2525. The inter¬ 
pretation for this estimate is that the median survival time 
(or time to any fixed value of S(t)) is estimated at 214 days 
more for CLINIC = 2 compared to CLINIC = 1. In other 
words, you add 214 days to the estimated median survival 
time for CLINIC = 1 to get the estimated median survival 
time for CLINIC = 2. This contrasts with the previous AFT 
model in which you multiply estimated median survival time 
for CLINIC = 1 by 1.79 to get the estimated median survival 
time for CLINIC = 2. The additive model can be viewed as a 
shifting of survival time whereas the AFT model can be viewed 
as a scaling of survival time. 

If survival time follows a logistic distribution and the ad¬ 
ditive failure time assumption holds, then the proportional 
odds assumption also holds. The logistic assumption can be 
evaluated by plotting the log odds of survival (using KM es¬ 
timates) against time (rather than against the log of time as 
analogously used for the evaluation of the log-logistic assump¬ 
tion). If the plots are straight lines for each pattern of covari¬ 
ates then the logistic distribution is reasonable. If the straight 
lines are also parallel then the proportional odds and additive 
failure time assumptions hold. 

Other distributions supported by PROC LIFEREG are 
the generalized gamma (DIST=GAMMA) and lognormal 
(DIST=LNORMAL) distributions. If the NOLOG option is 
specified with the DIST=LNORMAL option in the model state¬ 
ment, then survival time is assumed to follow a normal dis¬ 
tribution. SAS version 8.2 does not support frailty survival 
models. 

8. MODELING RECURRENT EVENTS 

The modeling of recurrent events is illustrated with the blad¬ 
der cancer dataset (bladder.sas7bdat) described at the start 
of this appendix. Recurrent events are represented in the 
data with multiple observations for subjects having multiple 
events. The data layout for the bladder cancer dataset is suit¬ 
able for a counting process approach with time intervals de¬ 
fined for each observation (see Chapter 8). The following code 
prints the 12th through 20th observation, which contains in¬ 
formation for four subjects. The code follows. 


PROC PRINT DATA=REF.BLADDER (FIRST0BS= 12 0BS=20); 
RUN; 
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The output follows. 


OBS 

ID 

EVENT 

INTERVAL 

START 

STOP 

TX 

NUM 

SIZE 

12 

10 

1 

1 

0 

12 

0 

1 

1 

13 

10 

1 

2 

12 

16 

0 

1 

1 

14 

10 

0 

3 

16 

18 

0 

1 

1 

15 

11 

0 

1 

0 

23 

0 

3 

3 

16 

12 

1 

1 

0 

10 

0 

1 

3 

17 

12 

1 

2 

10 

15 

0 

1 

3 

18 

12 

0 

3 

15 

23 

0 

1 

3 

19 

13 

1 

1 

0 

3 

0 

1 

1 

20 

13 

1 

2 

3 

16 

0 

1 

1 


There are three observations for ID = 10, one observation for 
ID = 11, three observations for ID = 12, and two observations 
for ID = 13. The variables START and STOP represent the 
time interval for the risk period specific to that observation. 
The variable EVENT indicates whether an event (coded 1) 
occurred. The first three observations indicate that the subject 
with ID = 10 had an event at 12 months, another event at 
16 months, and was censored at 18 months. 

PROC PHREG can be used for survival data using a count¬ 
ing processes data layout. The following code runs a model 
with the predictors, treatment status (TX), initial number of 
tumors (NUM), and the initial size of tumors (SIZE) included 
in the model. 

PROC PHREG DATA=BLADDER COVS(AGGREGATE); 

MODEL (START.STOP)* EVENT(0)=TX NUM SIZE; 

ID ID; 

RUN; 

The code (START,STOP)*EVENT(0) in the MODEL state¬ 
ment indicates that the time intervals for each observation 
are defined by the variables START and STOP and that 
EVENT = 0 denotes a censored observation. The ID state¬ 
ment defines ID as the variable representing subject. The 
COVS(AGGREGATE) option in the PROC PHREG statement 
requests robust standard errors for the parameter estimates. 
The output generated by PROC PHREG follows. 
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The PHREC Procedure 
Model Information 


Data Set 

Dependent Variable 
Dependent Variable 
Censoring Variable 
Censoring Value(s) 
Ties Handling 


WORK.BLADDER 
START 
STOP 
EVENT 
0 

BRESLOW 


Analysis of Maximum Likelihood Estimates 


Variable 

Parameter 

Estimate 

Standard 

Error 

StdErr 

Ratio 

Chi-Square 

Pr > ChiSq 

Hazard 

Ratio 

TX 

-0.40710 

0.24183 

1.209 

2.8338 

0.0923 

0.666 

NUM 

0.16065 

0.05689 

1.185 

7.9735 

0.0047 

1.174 

SIZE 

-0.04009 

0.07222 

1.028 

0.3081 

0.5788 

0.961 


Coefficient estimates are provided with robust standard er¬ 
rors. The column under the heading StdErrRatio provides 
the ratio of the robust to the nonrobust standard errors. 
For example, the standard error for the coefficient for TX 
(0.24183) is 1.209 greater than the standard error would be 
if we had not requested robust standard errors (i.e., omit the 
COVS(AGGREGATE) option). The robust standard errors are 
estimated slightly differently compared to the corresponding 
model in Stata. 

A stratified Cox model can also be run using the data in this 
format with the variable INTERVAL as the stratified variable. 
The stratified variable indicates whether the subject was at 
risk for a 1st, 2nd, 3rd, or 4th event. This approach is called 
conditional 1 in Chapter 8 and is used if the investigator 
wants to distinguish the order in which recurrent events oc¬ 
cur. The code for a stratified Cox follows. 

PROC PHREC DATA=BLADDER COVS(AGGREGATE); 

MODEL (START.STOP)* EVENT(0)=TX NUM SIZE; 

ID ID; 

STRATA INTERVAL; 

RUN; 
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The only additional code from the previous model is the 
STRATA statement indicating that the variable INTERVAL is 
the stratified variable. The output containing the parameter 
estimates follows. 

Analysis of Maximum Likelihood Estimates 


Variable 

Parameter 

Estimate 

Standard 

Error 

StdErr 

Ratio 

Chi-Square 

Pr > ChiSq 

Hazard 

Ratio 

TX 

-0.33430 

0.19706 

0.912 

2.8777 

0.0898 

0.716 

NUM 

0.11565 

0.04991 

0.930 

5.3690 

0.0205 

1.123 

SIZE 

-0.00805 

0.06012 

0.827 

0.0179 

0.8935 

0.992 


Interaction terms between the treatment variable (TX) and 
the stratified variable could be created to examine whether 
the effect of treatment differed for the 1st, 2nd, 3rd, or 4th 
event. 

Another stratified approach (called conditional 2) is a slight 
variation of the conditional 1 approach. The difference is in 
the way the time intervals for the recurrent events are defined. 
There is no difference in the time intervals when subjects are 
at risk for their first event. However, with the conditional 2 
approach, the starting time at risk gets reset to zero for each 
subsequent event. The following code creates data suitable 
for using the conditional 2 approach. 

DATA BLADDER2; 

SET REF.BLADDER; 

START2=0; 

ST0P2=ST0P-START; 

RUN; 

The new dataset (BLADDER2) copies the data from 
REF.BLADDER and creates two new variables for the time 
interval: START2, which is always set to zero and STOP2, 
which is the length of the time interval (i.e., STOP - START). 
The following code uses these newly created variables to run 
the conditional 2 approach with PROC PHREG. 

PROC PHREG DATA=BLADDER2 COVS(AGGREGATE); 
MODEL (START2,STOP2)*EVENT(0)=TX NUM SIZE; 
ID ID; 

STRATA INTERVAL; 

RUN; 
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The output follows. 

Parameter Standard StdErr Hazard 


Variable 

Estimate 

Error 

Ratio 

Chi-Square 

Pr > ChiSq 

Ratio 

TX 

-0.26952 

0.20808 

1.002 

1.6778 

0.1952 

0.764 

NUM 

0.15353 

0.04889 

0.938 

9.8620 

0.0017 

1.166 

SIZE 

0.00684 

0.06222 

0.889 

0.0121 

0.9125 

1.007 


The results using the conditional 1 and conditional 2 ap¬ 
proaches vary slightly. 

The counting process data layout with multiple observations 
per subject need not only apply to recurrent event data but 
can also be used for more conventional survival analyses in 
which each subject is limited to one event. A subject with four 
observations may be censored for the first three observations 
before getting the event in the time interval represented by the 
fourth observation. This data layout is particularly suitable 
for representing time-varying exposures (i.e., exposures that 
change values over different intervals of time). 


C. SPSS 

Analyses are carried out in SPSS by using the appropriate 
SPSS procedure on an SPSS dataset. Most users select pro¬ 
cedures by pointing and clicking the mouse through a series 
of menus and dialog boxes. The code, or command syntax, 
generated by these steps can be viewed and edited. 

Analyses on the addicts dataset are used to illustrate these 
procedures. The addicts dataset was obtained from a 1991 
Australian study by Caplehorn et al., and contains infor¬ 
mation on 238 heroin addicts. The study compared two 
methadone treatment clinics to assess patient time remaining 
under methadone treatment. The two clinics differed accord¬ 
ing to live-in policies for patients. A patient’s survival time was 
determined as the time (in days) until the person dropped out 
of the clinic or was censored. The variables are defined at the 
start of this appendix. 
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After getting into SPSS, open the dataset addicts.sav. The 
data should appear on your screen. This is now your work¬ 
ing dataset. To obtain a basic descriptive analysis of the 
outcome variable (SURVT) click on Analyze -»■ Descriptive 
Statistics -»• Descriptive from the drop-down menus to reach 
the dialog box to specify the analytic variables. Select the sur¬ 
vival time variable (SURVT) from the list of variables and 
enter it into the Variable box. Click on OK to view the out¬ 
put. Alternatively you can click on Paste (rather than OK) to 
obtain the corresponding SPSS syntax. The syntax can then 
be submitted (by clicking the button under Run), edited, or 
saved for another session. The syntax created is as follows 
(output omitted). 

DESCRIPTIVES 
VARIAB LES=survt 

/STATISTICS=MEAN STDDEV MIN MAX 

There are some analyses that SPSS only performs by submit¬ 
ting syntax rather than using the point and click approach 
(e.g., running an extended Cox model with two time-varying 
covariates). Each time the point and click approach is pre¬ 
sented the corresponding syntax will also be presented. 

To obtain more detailed descriptive statistics on survival time 
stratified by CLINIC, click on Analyze -»■ Descriptive Statis¬ 
tics -* Explore from the drop-down menus. Select SURVT 
from the list of variables and enter it into the Dependent List 
and then select CLINIC and enter it into the Factor List. Click 
on OK to see the output. The syntax created from clicking on 
Paste (rather than OK) is as follows (output omitted). 

EXAMINE 

VARIAB LES=survt BY clinic 
/PLOT BOXPLOT STEMLEAF 
/COMPARE GROUP 
/STATISTICS DESCRIPTIVES 
/CINTERVAL 95 
/MISSING LISTWISE 
/NOTOTAL 

Survival analyses can be performed in SPSS by selecting An¬ 
alyze —> Survival. There are then four choices for selection: 
Life Tables, Kaplan-Meier, Cox Regression, and Cox w/ Time- 
Dep Cov. The key SPSS procedures for survival analysis are 
the KM and COXREG procedures. 
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The survival analyses demonstrated in SPSS are: 

1. Estimating survival functions (unadjusted) and comparing 
them across strata; 

2. Assessing the PH assumption using Kaplan-Meier log-log 
survival curves; 

3. Running a Cox PH model; 

4. Running a stratified Cox model and obtaining Cox adjusted 
log-log curves; 

5. Assessing the PH assumption with a statistical test; and 

6. Running an extended Cox model. 

SPSS version 11.5 does not provide commands to run para¬ 
metric survival models, frailty models, or models using a 
counting processes data layout for recurrent events. 

1. ESTIMATING SURVIVAL FUNCTIONS 
(UNADJUSTED) AND COMPARING 
THEM ACROSS STRATA 

To obtain Kaplan-Meier survival estimates, select Analyze -»■ 
Survival -»• Kaplan-Meier. Select the survival time variable 
(SURVT) from the variable list and enter it into the Time box, 
then select the variable STATUS and enter it into the Status 
box. You will then see a question mark in parentheses after 
the status variable indicating that the value of the event needs 
to be entered. Click the Define Event button and insert the 
value 1 in the box because the variable STATUS is coded 1 
for events and 0 for censorships. Click on Continue and then 
OK to view the output. The syntax, obtained from clicking on 
Paste (rather than OK), is as follows (output omitted). 

KM 

survt /STATUS=status(l) 

/PRINT TABLE MEAN 

The stream of output of these KM estimates is quite long and 
does not all initially appear in the output window (called the 
SPSS viewer). In order to make it easier to view the output, 
try right-clicking inside the output and then select SPSS Rtf 
Document Object -»• Open to open up a text output window. 
The output is easier to view and can also be edited from this 
window. 
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To obtain KM survival estimates and plots by CLINIC as well 
as log rank (and other) test statistics, select Analyze -»■ Sur¬ 
vival —> Kaplan-Meier and then select SURVT as the time- 
to-event variable and STATUS as the status variable as de¬ 
scribed above. Enter CLINIC into the Factor box and click 
the Compare Factor button. You have a choice of three test 
statistics for testing the equality of survival functions across 
CLINIC. Select all three (log rank, Breslow, and Tarone-Ware) 
for comparison and click Continue. Click the Options button 
to request plots. There are four choices (unfortunately log- 
log survival plots are not included). Select Survival to obtain 
KM plots by clinic. Click Continue and then OK to view the 
output. To edit (and better view) the output, right-click on the 
output and then select SPSS Rtf Document Object -»■ Open 
in order to open up a text output window. 

The syntax follows. 

KM 

survt BY clinic /STATUS=status(l) 

/PRINT TABLE MEAN 
/PLOT SURVIVAL 

/TEST LOGRANK BRESLOW TARONE 
/COMPARE OVERALL POOLED 


The output containing the KM estimates for the first four 
event-times from CLINIC = 1 and CLINIC = 2 as well for 
the log rank, Breslow, and Tarone-Ware tests follows. 


Survival Analysis 

for SURVT Survival time 

(days) 



Factor 

CLINIC = 1 

.00 





Time 

Status 

Cumulative 

Standard 

Cumulative 

Number 



Survival 

Error 

Events 


Remaining 

2.00 

censored 




0 

162 

7.00 

endpoint 

.9938 

.0062 


1 

161 

17.00 

endpoint 

.9877 

.0087 


2 

160 

19.00 

endpoint 

.9815 

.0106 


3 

159 

28.00 

censored 




3 

158 

28.00 

censored 




3 

157 

29.00 

endpoint 

.9752 

.0122 


4 

156 
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Factor CLINIC =2.00 


Time 

Status 

Cumulative 

Standard 

Cumulative 

Number 



Survival 

Error 

Events 

Remaining 

2.00 

censored 



0 

74 

13.00 

endpoint 

.9865 

.0134 

1 

73 

26.00 

endpoint 

.9730 

.0189 

2 

72 

35.00 

endpoint 

.9595 

.0229 

3 

71 

41.00 

endpoint 

.9459 

.0263 

4 

70 


Test Statistics for Equality of Survival Distributions for CLINIC 


Statistic df Significance 


Log Rank 

Breslow 

Tarone-Ware 


27.89 1 .0000 
11.63 1 .0007 
17.60 1 .0000 


Note that what SPSS calls the Breslow test statistic is 
equivalent to what Stata (and SAS) call the Wilcoxon test 
statistic. 

Life table estimates can be obtained by selecting Analyze -»■ 
Survival -»• Life Tables. The time-to-event and status variables 
are defined similarly to those described above for Kaplan- 
Meier estimates. However, with life tables, SPSS presents a 
Display Time Intervals box. This allows the user to define the 
time intervals used for the life table analysis. For example, 0 
to 1000 by 100 would define 10 time intervals of equal length. 
Life table plots can similarly be requested as described above 
for the KM plots. 

2. ASSESSING THE PH ASSUMPTION USING 
KAPLAN-MEIER LOG-LOG SURVIVAL CURVES 

SPSS does not provide unadjusted KM log-log curves by di¬ 
rectly using the point and click approach with the KM com¬ 
mand. SPSS does provide adjusted log-log curves from run¬ 
ning a stratified Cox model (described later in the stratified 
Cox section). A log-log curve equivalent to the unadjusted KM 
log-log curve can be obtained in SPSS by running a stratified 
Cox without including any covariates in the model. In this 
section, however, we illustrate how new variables can be de¬ 
fined in the working dataset and then used to plot unadjusted 
log-log KM plots. 
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First a variable is created containing the KM survival esti¬ 
mates. Then another new variable is created containing the 
log-log of the survival estimates. Finally the log-log survival 
estimates are plotted against survival time to see if the curves 
for CLINIC = 1 and CLINIC = 2 are parallel. Each step can 
be done with the point-and-click approach or by typing in the 
code directly. 

A variable containing the survival estimates can be created 
by selecting Analyze -»■ Survival -»• Kaplan-Meier and then 
selecting SURVT as the time-to-event variable, STATUS as the 
status variable, and CLINIC as the factor variable as described 
above. Then click the Save button. This opens a dialog box 
called Kaplan-Meier Save New Variables. Check Survival and 
click on Continue and then on Paste. The code that is created 
is as follows. 

KM 

survt BY clinic /STATUS=status(l) 

/PRINT TABLE MEAN 
/SAVE SURVIVAL 

By submitting this code, a new variable containing the KM 
estimates called sur_l is created. To create a new variable 
called 11s containing the log(— log) of sur_l submit the fol¬ 
lowing code. 

COMPUTE 11s = LN(-LN (surd)) . 

EXECUTE 

The above code could also be generated by selecting Trans¬ 
form —> Compute and defining the new variable in the dialog 
box. To plot 11s against survival time submit the code: 

GRAPH 

/SCATTERPLOT (BIVAR)=survt WITH 11s BY clinic 
/MISSING=LISTWISE 

This final piece of code could also be run by selecting 
Graphs -»• Scatter and then clicking on Simple and then De¬ 
fine in the Scatterplot dialogue box. Select LLS for the Y-axis, 
SURVT for the X-axis, and CLINIC in the Set Marker By box. 
Clicking on paste creates the code or clicking OK submits the 
program. A plot of LLS against log(SURVT) could similarly 
be created. Parallel curves support the PH assumption for 
CLINIC. 
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3. RUNNING A COX PH MODEL 

A Cox PH model can be run by selecting Analyze -»• Sur¬ 
vival -> Cox Regression. Select the survival time variable 
(SURVT) from the variable list and enter it into the Time box, 
then select the variable STATUS and enter it into the Status 
box. You will then see a question mark in parentheses after the 
status variable indicating that the value of the event needs to 
be entered. Click the Define Event button and insert the value 
1 in the box because the variable STATUS is coded 1 for events 
and 0 for censorships. Click on Continue and select PRISON, 
DOSE, and CLINIC from the variable list and enter them into 
the Covariates box. You can click on Plots or Options to ex¬ 
plore some of the options (e.g., 95% Cl for exp([3)). Click OK 
to view the output or click on Paste to see the code. The code 
follows. 


COXREG 

survt /STATUS=status(l) 

/METHOD=ENTER prison dose clinic 
/CRITERIA=PIN(.05) POUT(.IO) ITERATE(20) 

Note that the proportional hazards assumption is assumed to 
hold for all three covariates using this Cox model (the output 
follows). 


Omnibus Tests of Model Coefficients a,b 


—2Log 
Likelihood 

Overall (score) 

Change From 
Previous Step 

Change From 
Previous Block 

Chi-square df Sig. 

Chi-square df Sig. 

Chi-square df Sig. 

1346.805 

56.273 3 .000 

64.519 3 .000 

64.519 3 .000 


a Beginning Block Number 0, initial Log Likelihood function: —2 Log likelihood: 1411.324 
b Beginning Block Number 1. Method = Enter 


Variables in the Equation 



B 

SE 

Wald 

df 

Sig. 

Exp(B) 

PRISON 

.327 

.167 

3.813 

1 

.051 

1.386 

DOSE 

-.035 

.006 

30.785 

1 

.000 

.965 

CLINIC 

-1.009 

.215 

22.045 

1 

.000 

.365 
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4. RUNNING A STRATIFIED COX MODEL AND 
OBTAINING COX ADJUSTED LOG-LOG CURVES 

A stratified Cox model is run by selecting Analyze —x Sur¬ 
vival -> Cox Regression. Select the survival time variable 
(SURVT) from the variable list and enter it into the Time box. 
Select the variable STATUS and enter it into the Status box 
and then define the value of the event as 1. Put the variables 
PRISON and DOSE in the Covariates box and the variable 
CLINIC in the Strata box. The Cox model will be stratified 
by CLINIC. Click the Plots button and check Log minus log 
as the plot type and then click on Continue. Click on OK to 
view the output or click on Paste to see the code. The code 
follows. 


COXREG 

survt /STATUS=status(l) /STRATA=clinic 
/METHOD=ENTER prison dose 
/PLOT LML 

/CRITERIA=PIN(.05) POUT(.IO) ITERATE(20) 

The output containing the parameter estimates and the ad¬ 
justed log-log plots follows. 


Variables in the Equation 



B 

SE 

Wald 

df 

Sig. 

Exp(B) 

PRISON 

.389 

.169 

5.298 

1 

.021 

1.475 

DOSE 

-.035 

.006 

29.552 

1 

.000 

.965 



Survival time (days) 

LML Functions at mean of covariates 
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Notice that there are parameter estimates for PRISON and 
DOSE but not CLINIC because CLINIC is the stratified vari¬ 
able. The Cox adjusted log-log plots are fitted using the mean 
values of PRISON and DOSE and are used to evaluate the PH 
assumption for CLINIC. 

Suppose rather than using the mean value of DOSE for the ad¬ 
justed log-log plots, you wish to obtain adjusted plots in which 
DOSE = 70. Run the same code as before, up to clicking on 
the Plots button and checking Log minus log as the plot type. 
Now click on DOSE(Mean) in the window called Covariate 
Values Plotted at. Type in the value 70 in the box underneath 
called Change Value and click on the button called Change. 
Now the variable in the window should be called DOSE(70) 
rather than DOSE(Mean). Click on Continue and then OK to 
view the output. 

5. ASSESSING THE PH ASSUMPTION WITH 
A STATISTICAL TEST 

SPSS does not easily accommodate a statistical test on the 
proportional hazards assumption using the Schoenfeld resid¬ 
uals. However, it can be programmed using several steps. The 
steps are as follows. 

1. Run a Cox PH model to obtain the Schoenfeld residuals for 
all the covariates. These residuals are saved as new vari¬ 
ables in the working dataset. 

2. Delete observations that were censored. 

3. Create a variable that contains the ranked order of survival 
time. For example, the subject who had the fourth event 
gets a value of 4 for this variable. 

4. Run correlations on the survival rankings with the Schoen¬ 
feld residuals. 

5. The p-value for testing whether the correlation is zero be¬ 
tween the ranked survival time and the covariate's Schoen¬ 
feld residuals is the p-value for the statistical test that the 
PH assumption is violated. The null hypothesis is that the 
PH assumption is not violated. 

First, run a Cox PH model with CLINIC, PRISON, and DOSE. 
Click on the Save button before submitting the model. A dia¬ 
log box appears that is called Cox Regression: Save New Vari¬ 
ables. Check Partial Residuals (under Diagnostics) and click 
on Continue. This creates three new variables in the working 
dataset called prl 1, pr2 1, and pr3 1, which are the partial 
residuals for CLINIC, PRISON, and DOSE, respectively. Click 
OK to run the model (or Paste to generate the code). 
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Next, delete all censored observations (i.e., only keep observa¬ 
tions in which STATUS = 1). To do this, select Data -»• Select 
Cases. Then check If condition is satisfied, and then click on If. 
Type status=l in the dialog box and click on Continue. Check 
Deleted in the box called Unselected Cases Are. Click OK and 
only observations with events will be kept in the dataset. 

Create the variable that contains the ranking of survival times 
by selecting Transform -»■ Ranked Cases. Select the survival 
time variable (SURVT) in the Variables box. Click on Rank 
Types, check Ranks, and click on Continue and then click on 
Ties, check Mean, and click Continue. Click OK and a new 
variable (called rsurvt) will be created containing the ranked 
survival time. 

Finally, obtain correlations (and their p-values) between the 
ranked survival and the Schoenfeld residuals. Select Ana¬ 
lyze -»■ Correlate. -> Bivariate. Move the ranked survival time 
variable as well as the three partial residual variables into the 
variable box. Check Pearson (for Pearson correlations) and 
Two-tailed for a two-tail test of significance and click OK to 
see the output. The code that is generated from these steps 
follows. 


COXREG 

survt /STATUS=status(l) 

/METHOD=ENTER clinic prison dose 
/SAVE= PRESID 

/CRITERIA=PIN(.05) POUT(.IO) ITERATE(20) 

FILTER OFF. 

USE ALL. 

SELECT IF(status=l). 

EXECUTE 

RANK 

VARIAB LES=survt (A) /RANK/PRINT=YES 
/TIES=MEAN 

CORRELATIONS 

/VARIABLES=rsurvt prl.1 pr2_l pr3_l 
/PRINT=TWOTAIL NOSIG 
/MISSING=PAIRWISE 
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The output containing the correlations follows. 


Correlations 




RANK of 
SURVT 

Partial 
Residual 
for CLINIC 

Partial 
Residual 
for PRISON 

Partial 
Residual 
for DOSE 

RANK of 

Pearson Correlation 

1 

-.262** 

-.080 

.077 

SURVT 

Sig. (2-tailed) 

— 

.001 

.332 

.347 


N 

150 

150 

150 

150 

Partial 

Pearson Correlation 

— .262** 

1 

.010 

.023 

residual for 

Sig. (2-tailed) 

.001 

— 

.904 

.776 

CLINIC 

N 

150 

150 

150 

150 

Partial 

Pearson Correlation 

-.080 

.010 

1 

.171* 

residual for 

Sig. (2-tailed) 

.332 

.904 

— 

.037 

PRISON 

N 

150 

150 

150 

150 

Partial 

Pearson Correlation 

-.077 

.023 

.171* 

1 

residual 

Sig. (2-tailed) 

.347 

.776 

.037 

— 

for DO 

N 

150 

150 

150 

150 


** Correlation is significant at the 0.01 level (2-tailed). 
* Correlation is significant at the 0.05 level (2-tailed). 


The p-values for the correlations are the p-values for the PH 
test. In the output, examine the row labeled RANK of SURVT 
Sig(2-tailed). Notice that the null hypothesis is rejected for 
CLINIC (p = 0.001) but not for PRISON (p = 0.332 ) or DOSE 
(p = 0.347). 

6. RUNNING AN EXTENDED COX MODEL 

An extended Cox model with exactly one time-dependent co¬ 
variate can be run using the point and click approach. Sup¬ 
pose we want to include a time-dependent covariate DOSE 
times the log of survival time. This product term could be 
appropriate if the hazard ratio comparing any two levels of 
DOSE monotonically increases (or decreases) over time. Se¬ 
lect Analyze —> Survival -»• Cox w/ Time-Dep Cov. This opens 
a dialog box called Expression for T COV . The user defines a 
time-dependent variable (called T COV ) in this box. A vari¬ 
able T_ is included in the variable list. This is the variable 
that represents time-varying survival (as opposed to SURVT 
which is an individual's fixed time of event). We wish to de¬ 
fine T_COV_ to be the log of T_ times DOSE. Enter the expres¬ 
sion LN(T_)*DOSE into the dialog box and click on the Model 
button. Now run a Cox model that includes the covariates: 
PRISON, CLINIC, DOSE, and T_COV_. The code generated is 
as follows. 
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TIME PROGRAM. 

COMPUTE T.COV. = LN(T_) * dose 

COXREG 

survt /STATUS=status(l) 

/METHOD=ENTER prison clinic dose T.COV. 
/CRITERIA=PIN(.05) POUT(.IO) ITERATE(20) 


The output containing the parameter estimates follows. 


Variables in the Equation 



B 

SE 

Wald 

df 

Sig. 

Exp(B) 

PRISON 

.340 

.167 

4.134 

1 

.042 

1.406 

CLINIC 

-1.019 

.215 

22.369 

1 

.000 

.361 

DOSE 

-.082 

.036 

5.247 

1 

.022 

.921 

T.COV. 

.009 

.006 

1.765 

1 

.184 

1.009 


The variable T_COV_ represents the time-dependent variable 
included in the model, which in this example is DOSE times 
the log of survival time. 

A Heaviside function for CLINIC can similarly be created. We 
can define a time-dependent variable equal to CLINIC if time 
is greater than or equal to 365 days and 0 otherwise. Select 
Analyze -» Survival -»• Cox w/ Time-Dep Cov. Define T_COV 
to be (T > 365)*CLII\IC. After clicking on the Model button, 
run a Cox model that includes PRISON, DOSE, CLINIC, and 
T_COV_. The code generated is as follows. 

TIME PROGRAM. 

COMPUTE T.COV. = (T_>= 365)* clinic 
COXREG 

survt /STATUS=status(l) 

/METHOD=ENTER prison dose clinic T.COV. 
/CRITERIA=PIN(.05) POUT(.IO) ITERATE(20) 

Note that SPSS recognizes the expression (T_ >= 365) as tak¬ 
ing the value 1 if survival time is > 365 days and 0 otherwise. 
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The output follows. 


Variables in the Equation 



B 

SE 

Wald 

df 

Sig. 

Exp(B) 

PRISON 

.378 

.168 

5.030 

1 

.025 

1.459 

CLINIC 

-.460 

.255 

3.241 

1 

.072 

.632 

DOSE 

-.036 

.006 

30.450 

1 

.000 

.965 

T.COV. 

-1.369 

.461 

8.799 

1 

.003 

.254 


Notice the variable CLINIC is included in this model and 
the time-dependent Heaviside function T_COV_ does not con¬ 
tribute to the estimated hazard ratio until day 365. The esti¬ 
mated hazard ratio for CLINIC at 100 days is exp(-0.460) = 
0.632 and the estimated hazard ratio for CLINIC at 400 days 
is exp((—0.460) + (-1.369)) = 0.161. 

It may be of interest to define two Heaviside functions (with 
CLINIC) and not include CLINIC in the model. This is essen¬ 
tially the same model as the one described above with one 
Heaviside function. However, the coding of two Heaviside 
functions makes it somewhat computationally more conve¬ 
nient for estimating the two hazard ratios for CLINIC (HR 
for <365 days and HR for >365 days). Unfortunately, SPSS 
allows just one time-dependent variable (i.e., T_COV_) using 
the point and click approach. However, by examining the code 
created for the single Heaviside function, there is only a slight 
adjustment needed to create code for two Heaviside functions. 
The following code creates two Heaviside functions (called 
HV1 and HV2) and runs a model containing PRISON, DOSE, 
HV1, and HV2. 

TIME PROGRAM. 

COMPUTE hv2= (T_>= 365)* clinic 

COMPUTE hvl= (T.< 365)* clinic 

COXREG 

survt /STATUS=status(l) 

/METHOD=ENTER prison dose hvl hv2 
/CRITERIA=PIN(.05) POUT(.IO) ITERATE(20) 
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The output follows. 


Variables in the Equation 



B 

SE 

Wald 

df 

Sig. 

Exp(B) 

PRISON 

.378 

.168 

5.030 

1 

.025 

1.459 

DOSE 

-.036 

.006 

30.450 

1 

.000 

.965 

HV1 

-.460 

.255 

3.241 

1 

.072 

.632 

HV2 

-1.828 

.386 

22.439 

1 

.000 

.161 


The parameter estimates for HV1 and HV2 can be used di¬ 
rectly to obtain the estimated hazard ratio for CLINIC = 2 
vs. CLINIC = 1 before and after 365 days. The estimated 
hazard ratio for CLINIC at 100 days is exp(—0.460) = 0.632 
and the estimated hazard ratio for CLINIC at 400 days is 
exp(—1.828) = 0.161. These results are consistent with the 
estimates obtained from the previous model with one Heavi¬ 
side function. 
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Chapter 1 


True-False Questions 

1. T 

2. T 

3. T 

4. F: Step function. 

5. F: Ranges between 0 and 1. 

6. T 

7. T 

8. T 

9. T 

10. F: Median survival time is longer for group 1 than for 
group 2. 

11. F: Six weeks or greater. 

12. F: The risk set at 7 weeks contains 15 persons. 

13. F: Hazard ratio. 

14. T 

15. T 

16. h{t) gives the instantaneous potential per unit time for the 
event to occur given that the individual has survived up to 
time f; h(t) is greater than or equal to 0; h(t) has no upper 
bound. 

17. Hazard functions 

• give insight about conditional failure rates; 

• help to identify specific model forms (e.g., exponen¬ 
tial, Weibull); 

• are used to specify mathematical models for survival 
analysis. 

18. Three goals of survival analysis are: 

• to estimate and interpret survivor and/or hazard 
functions; 

• to compare survivor and/or hazard functions; 

• to assess the relationship of explanatory variables to 
survival time. 
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hi) 

my 

9; 

R (ki)) 

Group 1: 0 

0 

0 

25 persons survive > 0 years 

1.8 

1 

0 

25 persons survive >1.8 years 

2.2 

1 

0 

24 persons survive >2.2 years 

2.5 

1 

0 

23 persons survive >2.5 years 

2.6 

1 

0 

22 persons survive > 2.6 years 

3.0 

1 

0 

21 persons survive >3.0 years 

3.5 

1 

0 

20 persons survive > 3.5 years 

3.8 

1 

0 

19 persons survive >3.8 years 

5.3 

1 

0 

18 persons survive >5.3 years 

5.4 

1 

0 

17 persons survive >5.4 years 

5.7 

1 

0 

16 persons survive >5.7 years 

6.6 

1 

0 

15 persons survive >6.6 years 

8.2 

1 

0 

14 persons survive >8.2 years 

8.7 

1 

0 

13 persons survive >8.7 years 

9.2 

2 

0 

12 persons survive > 9.2 years 

9.8 

1 

0 

10 persons survive >9.8 years 

10.0 

1 

0 

9 persons survive >10.0 years 

10.2 

1 

0 

8 persons survive > 10.2 years 

10.7 

1 

0 

7 persons survive >10.7 years 

11.0 

1 

0 

6 persons survive >11.0 years 

11.1 

1 

0 

5 persons survive >11.1 years 

11.7 

1 

3 

4 persons survive >11.7 years 


Chapter 2 


20. a. Group 1 has a better survival prognosis than group 2 
because group 1 has a higher average survival time 
and a correspondingly lower average hazard rate than 
group 2. 

b. The average survival time and average hazard rates give 
overall descriptive statistics. The survivor curves allow 
one to make comparisons over time. 


1. a. KM plots and the log rank statistic for the cell type 
1 variable in the vets.data dataset are shown below. 


Group 0 1 
Group 1 2 


KM Plots 
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Group 

Events 

observed 

Events 

expected 

1 

102 

93.45 

2 

26 

34.55 

Total 

128 

128.00 


Log rank = chi2(2) = 3.02 
p-value = Pr > chi2 = 0.0822 


The KM curves indicate that persons with large cell type 
have a consistently better prognosis than persons with 
other cell types, although the two curves are essentially 
the same very early on and after 250 days. The log rank 
test is not significant at the .05 level, which gives some¬ 
what equivocal findings. 

b. KM plots and the log rank statistic for the four cate¬ 
gories of cell type are shown below. 



The KM curves suggest that persons with adeno or small 
cell types have a poorer survival prognosis than persons 
with large or squamous cell types. Moreover, there does 
not appear to be a meaningful difference between adeno 
or small cell types. Also, persons with squamous cell 
type seem to have, on the whole, a better prognosis than 
persons with large cell type. 

Computer results from Stata giving log rank statistics 
are now shown. 


Group 

Events 

observed 

Events 

expected 

1 

26 

34.55 

2 

26 

15.69 

3 

45 

30.10 

4 

31 

47.65 

Total 

128 

128.00 

Log rank 

= chi2(2) = 

25.40 

P-value = 

Pr > chi2 = 

= 0.0000 
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The log-rank test yields highly significant p-values, in¬ 
dicating that there is some overall difference between 
all four curves; that is, the null hypothesis that the four 
curves have a common survival curve is rejected. 

2. a. KM plots for the two clinics are shown below. These 
plots indicate that patients in clinic 2 have a consis¬ 
tently better prognosis for remaining under treatment 
than do patients in clinic 1. Moreover, it appears that 
the difference between the two clinics is small before 
one year of follow-up but diverges after one year of 
follow-up. 



b. The log rank statistic (27.893) and Wilcoxon statistic 
(11.63) are both significant well below the .01 level, in¬ 
dicating that the survival curves for the two clinics are 
significantly different. The log rank statistic is never¬ 
theless much larger than the Wilcoxon statistic, which 
makes sense because the log rank statistic emphasizes 
the later survival experience, where the two survival 
curves are far apart, whereas the Wilcoxon statistic em¬ 
phasizes earlier survival experience, where the two sur¬ 
vival curves are closer together. 

c. If methadone dose is categorized into high (70+), 
medium (55-70) and low (<55), we obtain the KM 
curves shown below. 
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Chapter 3 


The KM curves indicate that persons with high doses have 
a consistently better survival prognosis (i.e., maintenance) 
than persons with medium or low doses. The latter two 
groups are not very different from each other, although the 
medium dose group has a somewhat better prognosis up 
to the first 400 days of follow-up. 

The log rank test statistic is shown below for the above 
categorization scheme. 


Group 

Events 

observed 

Events 

expected 

0 

45 

30.93 

1 

74 

54.09 

2 

31 

64.99 

Total 

150 

150.00 


Log rank = chi2(2) = 33.02 
P-value = Pr>chi2 = 0.0000 


The test statistic is highly significant, indicating that these 
three curves are not equivalent. 


1. a. / 2 (t,X) = /r 0 (t)exp[|3 1 ri + |3 2 T2 + | 3 3 PS+| 34 DC 
+ |3 5 BF + |3 6 (T1 x PS) + |3 7 (T2 x PS) 

+ |3 g (Tl x DC) + |3 9 (T2 x DC) 

+ |3 10 (T1 x BF) + p n (T2 x BF)] 

b. Intervention A: X* = (1, 0, PS, DC, BF, PS, 0, DC, 0, 
BF, 0) 

Intervention C: X = (-1, -1, PS, DC, BF, -PS, -PS, 
-DC, -DC, -BF, -BF) 

HR = = exp[2p! + |3 2 + 2|3 6 PS + |3 7 PS 

+ 2(3gDC + |3 9 DC + 2(3 [qBF 
+ |3hBF] 

c. H 0 : |3 6 = |3 7 = |3g = |3 9 = |3 10 = |3n = 0 in the full 
model. 

Likelihood ratio test statistic: —21n£i; — (—21nL f ), 
which is approximately xl under H 0 , where R denotes 
the reduced model (containing no product terms) un¬ 
der H 0l and F denotes the full model (given in Part la 
above) 

d. The two models being compared are: 

Full model (F): h(t,X) — /zofflexpffqTl + (3 2 T2 
+ (3 3 PS + P 4 DC + P 5 BF] 
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Reduced model ( R ): h(t ,\) = /z 0 (t)exp[(3 3 PS 
+ P 4 DC + P 5 BF] 

H 0 : (3 j = |3 2 = 0 in the full model 

Likelihood ratio test statistic: — 2\nL R — (—21nL F ), 

which is approximately xj under H 0 . 


Intervention A: 

S(t X) = [5o(t)] ex P[Pi+lPS)p3+(DC)P4+(BF)P5] 

Intervention B: 

S(t X) = [■§o(t)] eX P[^ 2+ ^PS)^ 3+ ^DC)^4+(BF)&5] 

Intervention C: 

S(t X) = [So(t)] e x p [-^ 1 -^ 2+ ( PS ^ 3+ (DC)^ 4 +(BF)P5] 

a. h(t,X) = /i 0 (T)exp[(3 1 CHR + |3 2 AGE + |3 3 (CHR x 

AGE)] 

b. H 0 : |3 3 = 0 

LR statistic = 264.90 - 264.70 = 0.21; / 2 with 1 d.f. 
under H tj ; not significant. 

Wald statistic gives a chi-square value of .01, also not 
significant. Conclusions about interaction: the model 
should not contain an interaction term. 

c. When AGE is controlled (using the gold standard model 
2), the hazard ratio for the effect of CHR is exp(.8051) = 
2.24, whereas when AGE is not controlled, the haz¬ 
ard ratio for the effect of CHR (using Model 1) is 
exp(.8595) = 2.36. Thus, the hazard ratios are not ap¬ 
preciably different, so AGE is not a confounder. 
Regarding precision, the 95% confidence interval for 
the effect of CHR in the gold standard model (Model 
2) is given by exp[.8051 ± 1.96(.3252)] = (1.183,4.231) 
whereas the corresponding 95% confidence interval 
in the model without AGE (Model 1) is given by 
exp[.8595 ± 1.96(.3116)] = (1.282, 4.350). Both confi¬ 
dence intervals have about the same width, with the 
latter interval being slightly wider. Thus, controlling for 
AGE has little effect on the final point and interval esti¬ 
mates of interest. 

d. If the hazard functions cross for the two levels of the 
CHR variable, this would mean that none of the models 
provided is appropriate, because each model assumes 
that the proportional hazards assumption is met for 
each predictor in the model. If hazard functions cross 
for CHR, however, the proportional hazards assumption 
cannot be satisfied for this variable. 



e. For CHR = 1: S(t,X) = [S 0 (? )] ex P [ 0- 8051 -H^osseCAGE)] 

For CHR = 0: S(f,X) = [$)(t)] exp[0 0856(AGE)] 

f. Using Model 1, which is the best model, there is evi¬ 
dence of a moderate effect of CHR on survival time, 
because the hazard ratio is about 2.4 with a 95% con¬ 
fidence interval between 1.3 and 4.4, and the Wald text 
for significance of this variable is significant below the 
.01 level. 

a. Full model (F — Model 1): h(t,X) = hoit) expfpjRx 
+ |3 2 Sex + |3 3 log WBC + |3 4 (Rx x Sex) 

+ |3 5 (.Rx x log WBC)] 

Reduced model (R — model 4): 

h(t,X) — ho(t)exp[fi l Rx + |3 2 Sex+ (3 3 log WBC] 

Hq : |3 4 = |3 5 = 0 

LR statistic = 144.218 - 139.030 = 5.19; / 2 with 2 d.f. 
under H 0 -, not significant at 0.05, though significant at 
0.10. The chunk test indicates some (though mild) evi¬ 
dence of interaction. 

b. Using either a Wald test (p-value = .776) or a LR test, 
the product term Rx x log WBC is clearly not signifi¬ 
cant, and thus should be dropped from Model 1. Thus, 
Model 2 is preferred to Model 1. 

c. Using Model 2, the hazard ratio for the effect of 
Rx is given by: HR = (h(t,X*))/(h(t,X)) — exp[0.405 
+ 2.013 Sex] 

d. Males (Sex = 0): HR = exp[0.405] = 1.499 
Females (Sex = 1): HR = exp[0.405 + 2.013(1)] = 
11.223 

e. Model 2 is preferred to Model 3 if one decides that 
the coefficients for the variables Rx and Rx x Sex are 
meaningfully different for the two models. It appears 
that such corresponding coefficients (0.405 vs. 0.587 
and 2.013 vs. 1.906) are different. The estimated haz¬ 
ard ratios for Model 3 are 1.799 (males) and 12.098 (fe¬ 
males), which are different, but not very different from 
the estimates computed in Part 3d for Model 2. If it is 
decided that there is a meaningful difference here, then 
we would conclude that log WBC is a confounder; oth¬ 
erwise log WBC is not a confounder. Note that the log 
WBC variable is significant in Model 2 {P — .000), but 
this addresses precision and not confounding. When in 
doubt, as in this case, the safest thing to do (for validity 
reasons) is to control for log WBC. 

f. Model 2 appears to be best, because there is signifi¬ 
cant interaction of Rx x Sex (P — .023) and because 
log WBC is a likely confounder (from Part e). 
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Chapter 4 


1. The P(PH ) values in the printout provide GOF statistics 
for each variable adjusted for the other variables in the 
model. These P (PH) values indicate that the clinic variable 
does not satisfy the PH assumption (P << .01), whereas 
the prison and dose variables satisfy the PH assumption 
(P > . 10 ). 

2. The log-log plots shown are parallel. However, the reason 
why they are parallel is because the clinic variable has been 
included in the model, and log-log curves for any variable 
in a PH model must always be parallel. If, instead, the clinic 
variable had been stratified (i.e., not included in the model), 
then the log-log plots comparing the two clinics adjusted 
for the prison and dose variables might not be parallel. 

3. The log-log plots obtained when the clinic variable is 
stratified (i.e., using a stratified Cox PH model) are not par¬ 
allel. They intersect early on in follow-up and diverge from 
each other later in follow-up. These plots therefore indi¬ 
cate that the PH assumption is not satisfied for the clinic 
variable. 

4. Both graphs of log-log plots for the prison variable show 
curves that intersect and then diverge from each other and 
then intersect again. Thus, the plots on each graph appear 
to be quite nonparallel, indicating that the PH assumption 
is not satisfied for the prison variable. Note, however, that 
on each graph, the plots are quite close to each other, so 
that one might conclude that, allowing for random varia¬ 
tion, the two plots are essentially coincident; with this latter 
point of view, one would conclude that the PH assumption 
is satisfied for the prison variable. 

5. The conclusion of nonparallel log-log plots in Question 4 
gives a different result about the PH assumption for the 
prison variable than determined from the GOF tests pro¬ 
vided in Question 1. That is, the log-log plots suggest that 
the prison variable does not satisfy the PH assumption, 
whereas the GOF test suggests that the prison variable sat¬ 
isfies the assumption. Note, however, if the point of view 
is taken that the two plots are close enough to suggest co¬ 
incidence, the graphical conclusion would be the same as 
the GOF conclusion. Although the final decision is some¬ 
what equivocal here, we prefer to conclude that the PH 
assumption is satisfied for the prison variable because this 
is strongly indicated from the GOF test and questionably 
counterindicated by the log-log curves. 
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6 . Because maximum methadone dose is a continuous vari¬ 
able, we must categorize this variable into two or more 
groups in order to graphically evaluate whether it satisfies 
the PH assumption. Assume that we have categorized this 
variable into two groups, say, low versus high. Then, ob¬ 
served survival plots can be obtained as KM curves for low 
and high groups separately. To obtain expected plots, we 
can fit a Cox model containing the dose variable and then 
substitute suitably chosen values for dose into the formula 
for the estimated survival curve. Typically, the values sub¬ 
stituted would be either the mean or median (maximum) 
dose in each group. 

After obtaining observed and expected plots for low and 
high dose groups, we would conclude that the PH assump¬ 
tion is satisfied if corresponding observed and expected 
plots are not widely discrepant from each other. If a notice¬ 
able discrepancy is found for at least one pair of observed 
versus expected plots, we conclude that the PH assumption 
is not satisfied. 

7. h(t,X ) = /io(T)exp[(3! clinic + |3 2 prison + |3 3 dose 

+ §! (clinic x g(t)) + 6 2 (prison x g(f)) 

+ 6 3 (dose xg(t))] 

where g(t ) is some function of time. The null hypothe¬ 
sis is given by H 0 : 5i = 8 2 = 8 3 = 0. The test statistic is 
a likelihood ratio statistic of the form LR — —2 In Lr — 
(—21nL F ) where R denotes the reduced (PH) model ob¬ 
tained when all 6 s are 0, and F denotes the full model 
given above. Under H 0 , the LR statistic is approximately 
chi-square with 3 d.f. 

8 . Drawbacks of the extended Cox model approach: 

• Not always clear how to specify g(t); different 
choices may give different conclusions; 

• Different modeling strategies to choose from, for ex¬ 
ample, might consider g(t) to be a polynomial in t 
and do a backward elimination to eliminate non¬ 
significant higher-order terms; alternatively, might 
consider g(t) to be linear in l without evaluating 
higher-order terms. 

Different strategies may yield different conclusions. 
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9. h(t,X) = /zo(f)exp[|3! clinic + |3 2 prison + |3 3 dose 
+ 6i(clinic x g(f))] where g{t) is some function of time. 
The null hypothesis is given by H 0 : 6i = 0, and the test 
statistic is either a Wald statistic or a likelihood ratio 
statistic. The LR statistic would be of the form LR — — 2 
lnL fi — (—21nL F ), where R denotes the reduced (PH) 
model obtained when 6i = 0, and F denotes the full model 
given above. Either statistic is approximately chi-square 
with 1 d.f. under the null hypothesis. 

10. t > 365 days: HR — exp[(3 x + 6i] 
t < 365 days: HR = exp[(3 1 ] 

If 6i is not equal to zero, then the model does not sat¬ 
isfy the PH assumption for the clinic variable. Thus, a test 
of Hq. 6 i = 0 evaluates the PH assumption; a significant 
result would indicate that the PH assumption is violated. 
Note that if 61 is not equal to zero, then the model assumes 
that the hazard ratio is not constant over time by giving 
a different hazard ratio value depending on whether t is 
greater than 365 days or t is less than or equal to 365 days. 

1. By fitting a stratified Cox (SC) model that stratifies on clinic, 
we can compare adjusted survival curves for each clinic, 
adjusted for the prison and dose variables. This will allow 
us to visually describe the extent of clinic differences on 
survival over time. However, a drawback to stratifying on 
clinic is that it will not be possible to obtain an estimate of 
the hazard ratio for the effect of clinic, because clinic will 
not be included in the model. 

2. The adjusted survival surves indicate that clinic 2 has a bet¬ 
ter survival prognosis than clinic 1 consistently over time. 
Moreover, it seems that the difference between the effects 
of clinic 2 and clinic 1 increases over time. 

3. h g (t,X) = ZiogffjexpfP! prison + |3 2 dose], g = 1,2 

This is a no-interaction model because the regression coef¬ 
ficients for prison and dose are the same for each stratum. 

4. Effect of prison, adjusted for clinic and dose: HR = 
1.475,95% Cl: (1.059, 2.054). It appears that having a 
prison record gives a 1.475 increased hazard for failure 
than not having a prison record. The p-value is 0.021, which 
is significant at the 0.05 level. 

5. Version 1: h g (t,X) — /zo g (t)exp[|3 lg prison + |3 2g dose], 
g = h2 

Version 2: h g (t,X) — /zog(?)exp[(3x prison + (3 2 dose 
+ (3 3 (clinic x prison) + |3 4 (clinic x dose)], g = 1,2 
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6 . g = 1 (clinic 1 ): 

hi(t,X) — /ioi(t)exp[(0.502)prison + (—0.036)dose] 
g = 2 (clinic 2 ): 

/z 2 (f,X) = / 2 02 (t)exp[(—0.083)prison + (—0.037)dose] 

7. The adjusted survival curves stratified by clinic are virtu¬ 
ally identical for the no-interaction and interaction models. 
Consequently, both graphs (no-interaction versus interac¬ 
tion) indicate the same conclusion that clinic 2 has con¬ 
sistently larger survival (i.e., retention) probabilities than 
clinic 1 as time increases. 

8 . H 0 : |3 3 = (3 4 = 0 in the version 2 model (i.e., the no¬ 
interaction assumption is satisfied). LR——2\nL R — 
(—2In L r ) where R denotes the reduced (no-interaction) 
model and F denotes the full (interaction) model. Under 
the null hypothesis, LR is approximately a chi-square with 
2 degrees of freedom. 

Computed LR — 1195.428 — 1193.558 = 1.87;p-value = 
0.395; thus, the null hypothesis is not rejected and we con¬ 
clude that the no-interaction model is preferable to the in¬ 
teraction model. 


1. For the chemo data, the -log-log KM curves intersect at 
around 600 days; thus the curves are not parallel, and this 
suggests that the treatment variable does not satisfy the PH 
assumption. 

2. The P {PH) value for the Tx variable is 0, indicating that the 
PH assumption is not satisfied for the treatment variable 
based on this goodness-of-fit test. 

3. h(t,X) = h 0 {t)exp[fi 1 (Tx)gi(t) +fi 2 (Tx)g 2 (t) 

+ (3 3 (T x)g 3 (f)] 

where 


if 0 < t < 250 days 
if otherwise 

if 250 < t < 500 days 
if otherwise 

if t > 500 days 
if otherwise 

4. Based on the printout the hazard ratio estimates and 
corresponding p-values and 95% confidence intervals are 
given as follows for each time interval: 


gi(0 = 
g2(0 = 
gb(f) = 
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Haz. Ratio p > |z 

[95% Conf. 
Interval] 

0 < t < 250 days: 

0.221 0.001 

0.089 

0.545 

250 < t < 500 days: 

1.629 0.278 

0.675 

3.934 

t > 500 days: 

1.441 0.411 

0.604 

3.440 


The results show a significant effect of treatment below 
250 days and a nonsignificant effect of treatment in each 
of the two intervals after 250 days. Because the coding 
for treatment was 1 = chemotherapy plus radiation versus 
2 = chemotherapy alone, the results indicate that the haz¬ 
ard for chemotherapy plus radiation is 1/0.221 = 4.52 
times the hazard for chemotherapy alone. The hazard ratio 
inverts to a value less than 1 (in favor of chemotherapy plus 
radiation after 250 days), but this result is nonsignificant. 
Note that for the significant effect of 1/0.221 = 4.52 be¬ 
low 250 days, the 95% confidence interval ranges between 
1/0.545 = 1.83 and 1/0.089 = 11.24 when inverted, which 
is a very wide interval. 

5. Model with two Heaviside functions: 

h(t,X) = ho(t )exp[ |3! (2x )g i (f) + (3 2 (Tx)g 2 (f)] 


if 0 < t < 250 days 
if otherwise 

if t > 250 days 
if otherwise 

Model with one Heaviside function: 
h(t,X) = /zoffiexptPjfTx) + |3 2 (7x)gi(t)] 
where gift) is defined above. 

6. The results for two time intervals give hazard ratios that 
are on the opposite side of the null value (i.e., 1). Below 250 
days, the use of chemotherapy plus radiation is, as in the 
previous analysis, 4.52 times the hazard when chemother¬ 
apy is used alone. This result is significant and the same 
confidence interval is obtained as before. Above 250 days, 
the use of chemotherapy alone has 1.532 times the hazard 
of chemotherapy plus radiation, but this result is nonsignif¬ 
icant. 


where 
gi(t) = 
gi(t) = 
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1. F: They are multiplicative models, although additive on the 
log scale. 

2. T 

3. T 

4. F: If the AFT assumption holds in a log-logistic model, the 
proportional odds assumption holds. 

5. F: An acceleration factor greater than one suggests the ex¬ 
posure is beneficial to survival. 

6. T 

7. T 

8. T 

9. F: ln(T) follows an extreme value minimum distribution. 

10. F: The subject is right-censored. 

11. 

exp[ao + oc\ (2) + <x 2 PRISON + 013 DOSE + 0 C 4 PRISDOSE ] 
^ expfcxo + ai(l) + ^PRISON + (X 3 DOSE + CX. 4 PRISDOSE ] 

= exp(ai) 

t = exp(0.698) = 2.01 

95% Cl = exp[0.698 ± 1.96(0.158)] = (1.47, 2.74) 

The point estimate for the acceleration factor (2.01) sug¬ 
gests that the survival time (time off heroin) is double for 
those enrolled in CLINIC = 2 compared to CLINIC = 1. The 
95% confidence interval does not include the null value of 
1.0 indicating a statistically significant preventive effect for 
CLINIC = 2 compared to CLINIC = 1. 

12. 

Hp exp[|3 0 + 0,(2) + fi 2 PRISON + 0 3 DOS£ + | I4PRISDOSE] 
~ exp[0 o + 13j(l) + fi 2 PRISON + | i 3 DOSE+ 0 4 PRISDOSE] 
= exp( (3,) 

HR = exp(—0.957) = 0.38 

95% Cl = exp[—0.957 ± 1.96(0.213)] = (0.25, 0.58) 

The point estimate of 0.38 suggests the hazard of going 
back on heroin is reduced by a factor of 0.38 for those 
enrolled in CLINIC = 2 compared to CLINIC = 1. Or from 
the other perspective: the estimated hazard is elevated for 
those in CLINIC = 1 by a factor of exp(+0.957) = 2.60. 

13. |3j = — oc\p for CLINIC, so = — (0.698)( 1.370467) = 
—0.957, which matches the output for the PH form of the 
model. 
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14. The product term PRISDOSE is included in the model as 
a potential confounder of the effect of CLINIC on survival. 
It is not an effect modifier because under this model the 
hazard ratio or acceleration factor for CLINIC does not 
depend on the value of PRISDOSE. The PRISDOSE term 
would cancel in the estimation of the hazard ratio or ac¬ 
celeration factor (see Questions 11 and 12). On the other 
hand, a product term involving CLINIC would be a poten¬ 
tial effect modifier. 

15. Using the AFT form of the model: 

—= exp[cx 0 + oq CLINIC + cc 2 PRISON + a 3 DOSE 

A /p 

+ a 4 PRISDOSE] 

Median survival time for CLINIC = 2, PRISON = 1, DOSE 
= 50, PRISDOSE = 100: 

t = [— In S(t)] 1/p xJL = [— ln(0.5)] 1/p 
x exp[(3 0 + 2(3j + |3 2 + 50(3 3 + IOOP 4 ] 

f (median) = 403.66 days (obtained by substituting param¬ 
eter estimates from output). 

16. Using the same approach as the previous question: 

Median survival time for CLINIC = 1, PRISON = 1, 
DOSE = 50, PRISDOSE = 100: 

t = [—ln(0.5)] 1/p x exp[(3 0 + l|3j + |3 2 + 50|3 3 + 100(3 4 ] 
f (median) = 200.85 days. 

17. The ratio of the median survival times is 403.66/200.85 = 
2.01. This is the estimated acceleration factor for CLINIC = 
2 vs. CLINIC = 1 calculated in Question 11. Note that if we 
used any survival probability (i.e., any quantile of survival 
time), not just S(t) = 0.5 (the median), we would have ob¬ 
tained the same ratio. 

18. The addition of the frailty component did not change any 
of the other parameter estimates nor did it change the log 
likelihood of —260.74854. 

19. If the variance of the frailty is zero (theta = 0), then the 
fr ailty has no effect on the model. A variance of zero means 
the frailty (a) is constant at 1. Frailty is defined as a mul¬ 
tiplicative random effect on the hazard h(t|a) = och(t). If 
oc = 1 then h(t|a) = h(t), and there is no frailty. 



572 Test Answers 


Chapter 8 


1. Cox PH Model for CP approach to Defibrillator Study: 

h(t,X) = h 0 (t)exp[(3 tx + y smoking] 

where tx = 1 if treatment A, 0 if treatment B, 
smoking status = 1 if ever smoked, 0 if never smoked 

2. Using the CP approach, there is no significant effect of treat¬ 
ment status adjusted for smoking. The estimated hazard 
ratio for the effect of treatment is 1.09, the corresponding 
P-value is 0.42, and a 95% Cl for the hazard ratio is (0.88, 
1.33). 

3. No-interaction SC model for marginal approach: 
hg(t,X) = h 0g (t)exp[(3 tx + y smoking], g = 1,2,3 
Interaction SC model for marginal approach: 
hg(t,X) = h 0g (t)exp[|3 g tx + y g smoking], g = 1,2, 3 

4. LR = —2 In Lr — (—2 In Lp) is approximately j 1 with 4 df 
under 

H 0 : no-interaction SC model is appropriate, where 
R denotes the reduced (no interaction SC) model and 
F denotes the full (interaction SC) model. 

5. The use of a no-interaction model does not allow you to 
obtain stratum-specific HR estimates, even though you are 
assuming that strata are important. 

6 . The CP approach makes sense for these data because re¬ 
current defibrillator (shock) events on the same subject are 
the same kind of event no matter when it occurred. 

7. You might use the marginal approach if you determined 
that different recurrent events on the same subject were 
different because they were of different order. 

8 . The number in the risk set (nj ) remains unchanged through 
day 68 because every subject who failed by this time was 
still at risk for a later event. 

9. Subjects 3, 6, 10, 26, and 31 all fail for the third time at day 
98 and are not followed afterwards. 

10. Subjects 9, 15, and 28 fail for the second time at 79 days, 
whereas subject #16 is censored at 79 days. 

11. Subjects 4, 14, 15, 24, and 29 were censored between days 
111 and 112. 

12. Subject #5 gets his first event at 45 days and his second 
event at 68 days, after which he drops out of the study. This 
subject is the first of the 36 subjects to drop out of the study, 
so the number in the risk set changes from 36 to 35 after 
68 days. 
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13. i. None of the above. 

ii. The product limit formula is not applicable to the CP 
data; in particular, P(T > 1 1T > t) does not equal “# fail¬ 
ing in time interval/# in the risk set at start of interval." 

14. Use the information provided in Table T.2 to complete the 
data layouts for plotting the following survival curves. 

a. Si(t) = Pr(T! > t) where Ti = time to first event from 
study entry 


t(j) 

n i 

m i 

0 j 

S(t (j) ) = S(t (j _!,) X Pr(Ti > 11 Ti > t) 

0 

36 

0 

0 

1.00 

33 

36 

2 

0 

0.94 

34 

34 

3 

0 

0.86 

36 

31 

3 

0 

0.78 

37 

28 

2 

0 

0.72 

38 

26 

4 

0 

0.61 

39 

22 

5 

0 

0.47 

40 

17 

1 

0 

0.44 

41 

16 

1 

0 

0.42 

43 

15 

1 

0 

0.39 

44 

14 

1 

0 

0.36 

45 

13 

2 

0 

0.31 

46 

11 

2 

0 

0.25 

48 

9 

1 

0 

0.22 

49 

8 

1 

0 

0.19 

51 

7 

2 

0 

0.19 x 5/7 = 0.14 

57 

5 

2 

0 

0.14x3/5 = 0.08 

58 

3 

2 

0 

0.08 x 1/3 = 0.03 

61 

1 

1 

0 

0.03 x 0/1 = 0.00 


b. Conditional S 2 C (t) = Pr(T 2 C > t) where T 2c = time to 
second event from first event. 


hj) 

n i 

mj 

Qj S(t(j)) = 

= S(t(j— 1} ) X Pr(Ti 

0 

36 

0 

0 

1.00 

5 

36 

1 

0 

0.97 

9 

35 

1 

0 

0.94 

18 

34 

2 

0 

0.89 

20 

32 

1 

0 

0.86 

21 

31 

2 

1 

0.81 

23 

28 

1 

0 

0.78 

24 

27 

1 

0 

0.75 


0 Continued) 
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t(j) 

n j 


q.i 

S(t(j)) = SO^d) X Pr(Ti > t 

1 Ti > t) 

25 

26 

1 

0 

0.72 


26 

25 

2 

0 

0.66 


27 

23 

2 

0 

0.60 


28 

21 

1 

0 

0.58 


29 

20 

1 

0 

0.55 


30 

19 

1 

0 

0.52 


31 

18 

3 

0 

0.43 


32 

15 

1 

0 

0.40 


33 

14 

5 

0 

0.26 


35 

9 

1 

0 

0.23 


39 

8 

2 

0 

0.17 


40 

6 

2 

0 

0.17 x 4/6 = 0.12 


41 

4 

1 

0 

0.12 x 3/4 = 0.09 


42 

3 

1 

0 

0.09 x 2/3 = 0.06 


46 

2 

1 

0 

0.06 x 1/2 = 0.03 


47 

1 

1 

0 

0.03 x 0/1 = 0.00 


Marginal S 2 m (t) 

= Pr(T 2m > t) where T 2m = 

time to 

second event from study entry. 


t(j) 

n j 

m i 

q.i 

C/5 

3: 

II 

C/5 

cr: 

X 

U 

H 

V 

1 T t > t) 

0 

36 

0 

0 

1.00 


63 

36 

2 

0 

0.94 


64 

34 

3 

0 

0.86 


65 

31 

2 

0 

0.81 


66 

29 

3 

0 

0.72 


67 

26 

4 

0 

0.61 


68 

22 

2 

0 

0.56 


69 

20 

1 

0 

0.53 


70 

19 

1 

0 

0.50 


71 

18 

1 

0 

0.47 


72 

17 

2 

0 

0.42 


73 

15 

1 

0 

0.39 


74 

14 

1 

0 

0.36 


76 

13 

1 

0 

0.33 


77 

12 

1 

0 

0.31 


78 

11 

2 

0 

0.25 


79 

9 

3 

1 

0.25 x 6/9 = 0.17 


80 

5 

2 

0 

0.17x3/5 = 0.10 


81 

3 

2 

0 

0.10 x 1/3 = 0.03 


97 

1 

1 

0 

0.03 x 0/1 = 0.00 
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15. The survival curves corresponding to the above data lay¬ 
outs will differ because they are describing different sur¬ 
vival functions. In particular, the composition of the risk 
set differs in all three data layouts and the ordered survival 
times being plotted are different as well. 

1. Cause-specific no-interaction model for local recurrence of 
bladder cancer (event =1): 

hi (t,X) = h 0 1 (t)exp [ (3! j tx + (3 2 inum+ |3 31 size] 

2. Censored subjects have bladder metastasis (event = 2) or 
other metastasis (event = 3). 

3. Cause-specific no-interaction model for bladder metastasis 
(event =1): 

h 2 (t,X) = ho2(t)exp[(3 12 tx+ (3 22 num+ |3 32 size] 

where censored subjects have local recurrence of bladder 
cancer (event = 1) or other metastasis (event = 3). 

4. A sensitivity analysis would consider worst-case violations 
of the independence assumption. For example, subjects 
censored from failing from events = 2 or 3 might be treated 
in the analysis as either all being event-free (i.e., change 
event status to 0 and time to 53) or all experiencing the 
event of interest (i.e., change event status to 1 and leave 
time as is). 

5. a. Verify the CICi calculation provided at failure time 

tj = 8 for persons in the treatment group (tx = 1): 

hi (8) = 1/23 = 0.0435 

S(4) = S(3)Pr(T > 4|T > 4) = 0.9630(1 - 2/26) 

= 0.9630(0.9231) = 0.8889 
ii(8) = hi(8)S(4) = 0.0435(.8889) = 0.0387 
CICi(8) = CICi(4) + 0.0387 = 0 + 0.0387 = 0.0387 

b. Verify the CICi calculation provided at failure time 
tj = 25 for persons in the placebo group (tx = 0): 

h i (25) = 1/6 = 0.1667 

S(23) = S(21)Pr(T > 23|T > 23) = 0.4150(1-1/8) 
= 0.4150(0.875) = 0.3631 
ii (25) = hi(25)S(23) = 0.1667(.3631) = 0.0605 
CICi (25) = CICi (23) + 0.0605 = 0.2949 + 0.0605 
= 0.3554 

c. Interpret the CICi values obtained for both the treat¬ 
ment and placebo groups at tj = 30. 

For tx = 1, CICi (tj = 30) = 0.3087 and for tx = 0, 
CICi(tj = 30) = 0.3554. 
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Thus, for treated subjects (tx =1), the cumulative risk 
(i.e., marginal probability) for local bladder cancer re¬ 
currence is about 30.1% at 30 months when allowing for 
the presence of competing risks for bladder metastasis 
or other metastasis. 

For placebo subjects (tx = 1), the cumulative risk (i.e., 
marginal probability) for local bladder cancer recur¬ 
rence is about 35.5% at 30 months when allowing for 
the presence of competing risks for bladder metastasis 
or other metastasis. 

The placebo group therefore has a 5% increased risk of 
failure than the treatment group by 30 months of follow¬ 
up. 

d. Calculating the CPCi values for both treatment and 
placebo groups at t; = 30: 

The formula relating CPC to CIC is given by 
CPC c = CIC c /(l — CIC c ') where CIC c = CIC for cause- 
specific risk event = 1 and CIC c ' = CIC from risks for 
events = 2 or 3 combined 

Fortx = 1, CICi(ti = 30) = 0.3087 and for tx = 0, 
ClCjftj = 30) = 0.3554. 

The calculation of CIC c ' involves recoding the event vari¬ 
able to 1 for subjects with bladder metastasis or other 
metastasis and 0 otherwise and then computing CIC c '. 
Calculation of CIC c / involves the following calculations. 


tx = 1 (Treatment A) 


tj 

n i 

do 

hi(tj) 

S(tj_i) 

ii(tj) 

ClCp(tj) 

0 

27 

0 

0 

— 

— 

— 

2 

27 

1 

.0370 

1 

.0370 

.0370 

3 

26 

2 

.0769 

.9630 

.0741 

.1111 

4 

24 

0 

0 

.8889 

0 

.1111 

8 

23 

1 

.0435 

.8889 

.0387 

.1498 

9 

21 

1 

.0476 

.8116 

.0386 

.1884 

10 

20 

1 

.0500 

.7729 

.0386 

.2270 

15 

17 

1 

.0588 

.7343 

.0432 

.2702 

16 

15 

1 

.0667 

.6479 

.0432 

.3134 

18 

14 

0 

0 

.6047 

0 

.3134 

22 

12 

0 

0 

.6047 

0 

.3134 

23 

11 

0 

0 

.5543 

0 

.3134 

24 

8 

0 

0 

.5039 

0 

.3134 

26 

7 

0 

0 

.4409 

0 

.3134 

28 

4 

1 

.2500 

.3779 

.0945 

.4079 

29 

2 

0 

0 

.2835 

0 

.4079 

30 

1 

0 

0 

.2835 

0 

.4079 
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tx = 0 (Placebo) 

tj nj dij hi(tj) S(tj_i) Ii(tj) CICi(tj) 


0 26 0 

1 26 0 

2 24 0 

3 23 0 

5 21 1 

6 20 2 

7 18 1 

10 16 1 

12 15 1 

14 13 0 

16 12 1 

17 10 0 

18 9 0 

21 8 1 

23 7 0 

25 6 1 

29 4 0 

30 2 0 


0 

— 

0 

1 

0 

.9615 

0 

.9215 

.0476 

.8413 

.1000 

.8013 

.0556 

.7212 

.0625 

.6811 

.0667 

.6385 

0 

.6835 

.0833 

.5534 

0 

.4612 

0 

.4150 

.1250 

.4150 

0 

.3632 

.1667 

.3632 

0 

.2421 

0 

.2421 


0 

0 

0 

0 

0 

0 

.0400 

.0400 

.0801 

.1201 

.0401 

.1602 

.0426 

.2028 

.0426 

.2454 

0 

.2454 

.0461 

.2915 

0 

.2915 

0 

.2915 

.0519 

.3434 

0 

.3434 

.0605 

.4039 

0 

.4039 

0 

.4039 


From these tables, we find that for tx = 1, ClCfr (tj = 30) = 
0.4079 and for tx = 0, CKfr(tj = 30) = 0.4039. 

Thus, for tx = 1, CPChftj = 30) = CICj/d - ClCfr) = 
0.3087/(1 - 0.4079) = 0.5213 and for tx = 0, CPCi(tj = 
30) = CICi/(l - CICiO = 0.3554/(1 - 0.4039) = 0.5962. 
6 . a. What is the effect of treatment on survival from hav¬ 
ing a local recurrence of bladder cancer and is it signi¬ 
ficant? 

HRi(tx= 1 vs. tx = 0) = 0.535(= 1/1.87), 
p-value = 0.250, N.S. 

b. What is the effect of treatment on survival from de¬ 
veloping metastatic bladder cancer and is it signifi¬ 
cant? 

HR 2 (tx = 1 vs. tx = 0) = 0.987, 
p-value = .985, N.S. 

c. What is the effect of treatment on survival from other 
metastatic cancer and is it significant? 

HR 3 (tx= 1 vs. tx = 0) = 0.684 (= 1/1.46), 
p-value = .575, N.S. 
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7. a. State the hazard model formula for the LM model used 

for the above output. 

h g *(t,X) = h 0 g*(t)exp[|3 1 tx + (3 2 num+ |3 3 size 
g_1 ' 2 ' 3 + 6 i(txd 2 ) + 6 2 (numd 2 ) 

+ 5 3 (sized2) + 6 4 (txd3) 

+ 6 5 (numd3) + 5 6 (sized3)] 

where d 2 = 1 if bladder metastasis and 0 otherwise, 
and 

d3 = 1 if or other metastasis and 0 otherwise 

b. Determine the hazard ratios for the effect of each of the 
3 cause-specific events based on the above output. 

HRi(tx = 1 vs. tx = 0) = exp(—0.6258) 

= 0.535(= 1/1.87) 

HR 2 (tx = 1 vs. tx = 0) = exp(—0.6258 + .6132) 

= 0.987 

HR 3 (tx = 1 vs. tx = 0) = exp(—0.6258 + .2463) 

= 0.684(= 1/1.46) 

c. Verify that the hazard ratios computed in Part b are 
identical to the hazard ratios computed in Question 6 . 
The HRs are identical. 

8 . a. State the hazard model formula for the LM a i t model 

used for the above output. 

hg(t,X) = hog(t)exp[5' n txdl + 5' 12 numdl + 6 ' 13 sizedl 
g_ 1 ’ 2 ' 3 + 5 2 j txd 2 + 5 22 numd 2 

+ 8 23 sized2 + 6 31 txd3 
+ 5 32 numd3 + 8 33 sized3] 

where dl = 1 if local bladder cancer recurrence and 0 
otherwise 

d 2 = 1 if bladder metastasis and 0 otherwise, 
and 

d3 = 1 if or other metastasis and 0 otherwise 
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b. Determine the hazard ratios for the effect of each of 
the three cause-specific events based on the above 
output. 

HRi(tx = 1 vs. tx = 0) = exp(—0.6258) 

= 0.535 (= 1/1.87) 

HR 2 (tx = 1 vs. tx = 0) = exp(—0.0127) = 0.987 
HR 3 (tx = 1 vs. tx = 0) = exp(—0.3796) 

= 0.684(= 1/1.46) 


c. Verify that the hazard ratios computed in Part b are 
identical to the hazard ratios computed in Questions 6 
and 7. 

Corresponding hazard ratios are identical. 

9. State the formula for a no-interaction SC LM model for 
these data. 

h*(t,X) = h5 g (t)exp[(3!tx 4- |3 2 num+ |3 g size] 

g= 1,2,3 

Assumes HRi(X) = HR 2 (X) = HR 3 (X) for any X variable 
e.g., Rx = 0 vs. Rx = 1: 

HRi(tx) = HR 2 (tx) = HR 3 (tx) = exp[ (3, ] 

10. Describe how you would test whether a no-interaction SC 
LM model would be more appropriate than an interaction 
SC LM model. 

Carry out the following likelihood ratio test: 

H 0 : 8gj = 0, g = 2, 3; j = 1, 2, 3 

where 6gj is coefficient of D g Xj in the interaction SC LM 
model 

Likelihood Ratio test: LR = — 21ogL R — (—21ogL F ) 
approx xl under H 0 

R = no-interaction SC (reduced) model 
F = interaction SC (full) model 
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Accelerated failure time 
(AFT), see AFT 
entries 

Acceleration factor, 266-267 
exponential, 269-271 
with frailty, 303, 309 
log-logistic, 281 
Weibull, 276 

Addicts dataset, 464 
data analysis, 230-234, 
248 

with SAS programming, 
508-538 

with SPSS programming, 
542-555 
with STATA 

programming, 

467-502 

Additive failure time model, 
285 


Adjusted survival curves, 
104 

log-log plots, 159 

observed versus expected 
plots, 150 

stratified Cox procedure, 
180 

using Cox PH model, 
103-107, 116, 118 

AFT (accelerated failure 
time) assumption, 
266-268 

AFT models, 265-266, 
282-284, 309, 
313-317 

Age-Related Eye Disease 
Study (AREDS), 
359-364 

AIC (Akaike’s information 
criterion), 286 
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Average hazard rate, 24 

in Cox PH model, 94-95, 134 
in extended Cox model, 219 
in stratified Cox model, 181 

Baseline hazard function, 94, 134 
Biased results, 405 
Binary regression, 290 
Bladder cancer dataset, 464-465 
Bladder cancer patients 

comparison of results for, 355-357 
counting process for first 26 
subjects, 339 

hypothetical subjects, 336-337 
interaction model results for, 354 
no-interaction model results for, 354 
Byar data, 399^100 

cause-specific competing risk analysis, 
401-403 

Lunn-McNeil models, 422, 427 

Cause-specific hazard function, 400 
Censored data, 5-8 

interval-censored, 286, 289 
left-censored, 7, 286 
right-censored, 7, 286 
Censoring, 5 

non-informative (independent), 
403-407 

informative (dependent), 405-406 
CIC (cumulative incidence curves), 393, 
412-420 

Competing risks, 4, 392, 396 
examples of data, 396-398 
CIC,412 

CPC (Conditional Probability Curves), 
420 

independence assumption, 403 
Lunn-McNeil models, 421, 427 
separate models for different event 
types, 400^403 

Complementary log-log binary model, 293 
Complementary log-log link function, 292 
Computer 

augmented (Lunn-McNeil approach) 
data layout for, 422 


counting process data layout for, 
337-338 

general data layout for, 15-19 
marginal approach data layout for, 
348-349 

Conditional failure rate, 11 
Conditional probability curves (CPC), 
420-421 

Conditional survival function, 295 
Confounding effect, 26-27 
Counting process (CP) approach, 334-336 
example, 336-337 
general data layout, 338-344 
results for example, 346-347 
Cox adjusted survival curves 
using SAS, 525-529 
using SPSS, 549-550 
using Stata, 485-488 
Cox likelihood, 111-115 

extended for time dependent variables, 
239-242 

Cox PH cause-specific model, 400 
Cox proportional hazards (PH) model 
adjusted survival curves using, 

103-107 

computer example using, 86-94 
extension of, see Extended Cox model 
formula for, 94-96 
maximum likelihood estimation of, 
98-100 

popularity of, 96-98 
review of, 214-216 
using SAS, 514-518 
using SPSS, 548 
using Stata, 476-481 
CP, see Counting process approach 
CPC (conditional probability curves), 
420-421 

Cumulative incidence, viii 
Cumulative incidence curves (CIC), 393, 
412-420 

Data layout for computer, 15-19 
Datasets, 464-465 
Decreasing Weibull model, 13 
Discrete survival analysis, 293 
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Empirical estimation, 344 
Estimated -ln(-ln) survivor curves, 136 
Estimated survivor' curves, 25 
Evans County Study, 

Cox proportional hazards (PH) model 
application to, 124-126 
Kaplan-Meier survival curves for, 

73-75 

multivariable example using, 29-31 
ordered failure times for, 41-42 
survival data from, 119-122 
Event, 4 

Event types, different, separate models for, 
400-403 

Expected versus observed plots, 145-150 
Exponential regression, 12 

accelerated failure-time form, 319 
log relative-hazard form, 319 
Extended Cox likelihood, 239-242 
Extended Cox model, 110, 219 
application to Stanford heart 
transplant data, 235-239 
application to treatment of heroin 
addiction, 230-234 
hazard ratio formula for, 221-223 
using SAS, 530-533 
using SPSS, 552-555 
using Stata, 488-492 
time-dependent variables, 

219-221 

Failure, 4 

Failure rate, conditional, 11 
Flemington-Harrington test, 65 
Frailty component, 295 
Frailty effect, 300 
Frailty models, 294-308 
using Stata, 499-502 

Gamma distribution, 296 
Gamma frailty, 301 
Gastric carcinoma data, 253-254 
General stratified Cox (SC) model, 

180-181 

Generalized gamma model, 284 
GOF, see Goodness-of-fit entries 


Gompertz model, 285 
Goodness-of-fit (GOF) testing approach, 
151-153 

Goodness-of-fit (GOF) tests, 136 

Hazard function, 8, 9-10 
cause-specific, 400 
probability density function and, 
262-263 
Hazard ratio, 32 

formula fpr Cox PH model, 100-103 
formula for extended Cox 
model, 221-223 
Heaviside function, 227 

Increasing Weibull model, 13 
Independence assumption, 403-411 
Information matrix, 346 
Informative censoring, in competing 
risks survival analysis, 405 
Instantaneous potential, 49 
Interactions, 27 

Interval-censored data, 286, 289-294 
Inverse-Gaussian distribution, 296 

Kaplan-Meier (KM) curves, 46 
example of, 51-55 
general features of, 56-57 
log-log survival curves, 141 

Left-censored data, 7-8, 286 
Leukemia remission-time data, 17, 26 
Cox proportional hazards (PH) model 
application to, 86-94 
increasing Weibull for, 13 
Kaplan-Meier survival curves for, 
51-55,75-77 

log-log KM curves for, 141-145 
recurrent event data for, 335 
exponential survival, 263-265 
stratified Cox (SC) model application 
to, 176-188 

Likelihood function, 111-115 
for Cox PH model, 
for extended Cox model, 239-242 
for parametric models, 286-289 
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Likelihood ratio (LR) statistic, 89 
LM, see Lunn-McNeil approach 
Log-log plots, 137-145 
Log-log survival curves, 137-145 
Log-logistic regression, 277-282 
accelerated failure-time form, 320 
Log-rank test, 46 
alternatives to, 63-68 
for several groups, 61-63 
for two groups, 57-61 
Logit link function, 292 
Lognormal survival models, 13, 284 
Lunn-McNeil (LM) approach, 393, 
421-427 

alternative, 427-434 

Macular degeneration data set, 359-364 
results for, 361-362 
Marginal probability, 414 
Maximum likelihood (ML) estimation of 
Cox PH model, 98-100 
Multiplicative model, 285 

No-interaction assumption in stratified 
Cox model, 182-188 
Noninformative censoring, 403-407 

Observed versus expected plots, 145-150 

Parametric approach using shared frailty, 
357-359 

Parametric survival models, 
defined, 260 
examples 

exponential model, 263-265, 268-277 
log-logistic model, 277-282 
Weibull model, 272-277 
likelihood function, 286-289 
other models, 284-286 
SAS use, 533-538 
Stata use, 492-499 
Pepe-Mori test, 421, 437 
Peto test, 63 
PH assumption 

assessment using time-dependent 
covariates, 153-157, 224-229 


evaluating (Chapter 4), 132-171 
meaning of, 107-111 
assessment using goodness of fit test 
with Schonheld residuals, 151-153 
assessment using Kaplan-Meier log-log 
survival curves, 546-547 
assessment using observed versus 
expected plots, 145-150 
assessment using SAS, 522-525 
assessment using SPSS, 550-552 
assessment using Stata, 473-476, 
483-485 

PH model, Cox, see Cox proportional 
hazards (PH) model 

PO (proportional odds) assumption, 292 
Precision, 92 
Probability, 11 

Probability density function, 262-263 
PROC LIFEREG (SAS), 533-538 
PROC LIFETEST (SAS), 510-514 
PROC PHREG, 514-518 
Product limit formula, 46 
Proportional hazards, see PH entries 
Proportional odds (PO) assumption, 292 

Recurrent event survival analysis 
(Chapter 8), 332-390 
Counting process approach, 336-344 
definition of recurrent events, 4, 332 
examples of recurrent event data, 
334-336 

other approaches for analysis 347-353 
parametric approach using shared 
frailty, 357-359 
SAS modeling, 538-542 
Stata modeling, 502-508 
survival curves with, 364-367 
Right-censored data, 7, 286 
Risk set, 22 

Robust estimation, 344-346 
Robust standard error, 347 
Robust variance, 345 

SAS, 508-542 

assessing PH assumption, with 
statistical tests, 522-525 
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demonstrating PROC LIFETEST, 
510-514^ 

modeling recurrent events, 538-542 
obtaining Cox adjusted survival 
curves, 525-529 

running Cox proportional hazards (PH) 
model, with PROC PHREG, 
514-518 

running extended Cox model, 530-533 
running parametric models, with PROC 
LIFEREG, 533-538 
running stratified Cox (SC) model, 
519-521 

SC, see Stratified Cox model 
Schonfield residuals, 151-152, 522-523 
Score residuals, 346 
Semi-parametric model, 96, 261 
Sensitivity analysis (with competing 
risks), 408-411 
Shape parameter, 272 
Shared frailty, 306-308, 358 
recurrent events analysis using, 

357-359 

Shared frailty model, 306 
SPSS,542-555 

assessing PH assumption 
with statistical tests, 550-552 
using Kaplan-Meier log-log survival 
curves, 546-547 

estimating survival functions, 544-546 
running Cox proportional hazards (PH) 
model, 548 

running extended Cox model, 552-555 
running stratified Cox (SC) model, 
549-550 

Stanford Heart Transplant Study 
extended Cox model application to, 
235-239 

transplants versus nontransplants, 
244-245 

Stata, 60, 465-508 

assessing PH assumption 

using graphical approaches, 473-476 
using statistical tests, 483^185 
estimating survival functions, 469-473 
modeling recurrent events, 502-508 


obtaining Cox adjusted survival curves 
with, 485-488 

running Cox proportional hazards (PH) 
model, 476-481 
running extended Cox model, 

488-492 

running frailty models, 499-502 
running parametric models, 492-499 
running stratified Cox (SC) model, 
481-483 

Step functions, 9 
Strata variable, 347 

Stratification variables, several, 188-193 
Stratified Cox (SC) model, 176-180, 
364-367 

for analyzing recurrent event data, 
347-353 

conditional approaches, 347-351 
general, 180-181 
graphical view of, 193-194 
marginal approach, 347-351 
using SAS, 519-521 
using SPSS, 549-550 
using Stata, 481-483 Subdistribution 
function, 419 
Survival curves 
adjusted, 104 

using Cox PH model, 103-107 
Cox adjusted, see Cox adjusted 
survival curves 

with recurrent events, 364-367 
Survival functions 
conditional, 295 

probability density function and, 
262-263 

SAS estimation, 525-529 
SPSS estimation, 544-546 
Stata estimation, 469-473 
unconditional, 295 
Survival time, 4 
Survival time variable, 14 
Survivor function, 8-9 

Tarone-Ware test statistic, 64 
Time-dependent covariates, assessing PH 
assumption using, 153-157 
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Time-dependent variables, 134 

definition and examples of, 216-219 
extended Cox model for, 219-221 
Time-independent variables, PH 
assumption and, 224-229 

Unconditional survival function, 295 
Unshared frailty, 306 

Veterans Administration Lung Cancer 
Data 

Kaplan-Meier survival curves for, 62-63 
model with no frailty, 296 


proportional hazards assumption 
evaluation for, 161-164 
with several stratification variables, 
188-193 

stratified Cox (SC) model application 
to, 201-204 

Wald statistic, 89 
Weibull model, 272-277 
Weibull regression 

accelerated failure-time form, 322, 325 
log relative-hazard form, 323, 325 
Wilcoxon test, 64 



